Overview
Comment: | smaz-tools: add a trie-based dynamic dictionary lookup |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
34d16d2c19e405f067b5e2c985782fbf |
User & Date: | nat on 2016-10-17 20:48:12 |
Other Links: | manifest | tags |
Context
2016-10-18
| ||
21:47 | smaz-tools: add a trie-based search compatible with Dictionary.Hash check-in: 52b491f017 user: nat tags: trunk | |
2016-10-17
| ||
20:48 | smaz-tools: add a trie-based dynamic dictionary lookup check-in: 34d16d2c19 user: nat tags: trunk | |
2016-10-16
| ||
17:21 | tools/smaz.adb: use the new map-based dictionary hash in evaluation check-in: 562e1cf9fc user: nat tags: trunk | |
Changes
Modified src/natools-smaz-tools.adb from [537cf082aa] to [2f6aac0df0].
︙ | ︙ | |||
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | -- WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF -- -- MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR -- -- ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES -- -- WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN -- -- ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF -- -- OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. -- ------------------------------------------------------------------------------ package body Natools.Smaz.Tools is package Sx renames Natools.S_Expressions; function Dummy_Hash (Value : String) return Natural; -- Placeholder for Hash member, always raises Program_Error function Image (B : Boolean) return String; -- Return correctly-cased image of B ------------------------------ -- Local Helper Subprograms -- ------------------------------ function Dummy_Hash (Value : String) return Natural is pragma Unreferenced (Value); begin raise Program_Error with "Dummy_Hash called"; return 0; end Dummy_Hash; | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | -- WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF -- -- MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR -- -- ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES -- -- WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN -- -- ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF -- -- OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. -- ------------------------------------------------------------------------------ with Ada.Unchecked_Deallocation; package body Natools.Smaz.Tools is package Sx renames Natools.S_Expressions; function Build_Node (Map : Dictionary_Maps.Map; Empty_Value : Natural) return Trie_Node; function Dummy_Hash (Value : String) return Natural; -- Placeholder for Hash member, always raises Program_Error function Image (B : Boolean) return String; -- Return correctly-cased image of B procedure Free is new Ada.Unchecked_Deallocation (Trie_Node, Trie_Node_Access); ------------------------------ -- Local Helper Subprograms -- ------------------------------ function Build_Node (Map : Dictionary_Maps.Map; Empty_Value : Natural) return Trie_Node is function First_Character (S : String) return Character is (S (S'First)); function Is_Current (Cursor : Dictionary_Maps.Cursor; C : Character) return Boolean is (Dictionary_Maps.Has_Element (Cursor) and then First_Character (Dictionary_Maps.Key (Cursor)) = C); function Suffix (S : String) return String; function Suffix (S : String) return String is begin return S (S'First + 1 .. S'Last); end Suffix; use type Ada.Containers.Count_Type; Cursor : Dictionary_Maps.Cursor; Result : Trie_Node := (Ada.Finalization.Controlled with Is_Leaf => False, Index => Empty_Value, Children => (others => null)); begin pragma Assert (Dictionary_Maps.Length (Map) >= 1); Cursor := Dictionary_Maps.Find (Map, ""); if Dictionary_Maps.Has_Element (Cursor) then Result.Index := Natural (Dictionary_Maps.Element (Cursor)); end if; for C in Character'Range loop Cursor := Dictionary_Maps.Ceiling (Map, (1 => C)); if Is_Current (Cursor, C) then if not Is_Current (Dictionary_Maps.Next (Cursor), C) and then Dictionary_Maps.Key (Cursor) = (1 => C) then Result.Children (C) := new Trie_Node'(Ada.Finalization.Controlled with Is_Leaf => True, Index => Natural (Dictionary_Maps.Element (Cursor))); else declare New_Map : Dictionary_Maps.Map; begin loop Dictionary_Maps.Insert (New_Map, Suffix (Dictionary_Maps.Key (Cursor)), Dictionary_Maps.Element (Cursor)); Dictionary_Maps.Next (Cursor); exit when not Is_Current (Cursor, C); end loop; Result.Children (C) := new Trie_Node'(Build_Node (New_Map, Empty_Value)); end; end if; end if; end loop; return Result; end Build_Node; function Dummy_Hash (Value : String) return Natural is pragma Unreferenced (Value); begin raise Program_Error with "Dummy_Hash called"; return 0; end Dummy_Hash; |
︙ | ︙ | |||
337 338 339 340 341 342 343 344 345 346 347 348 349 350 | end To_Dictionary; --------------------------------- -- Dynamic Dictionary Searches -- --------------------------------- function Linear_Search (Value : String) return Natural is Result : Ada.Streams.Stream_Element := 0; begin for S of List_For_Linear_Search loop exit when S = Value; Result := Result + 1; | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 | end To_Dictionary; --------------------------------- -- Dynamic Dictionary Searches -- --------------------------------- overriding procedure Adjust (Node : in out Trie_Node) is begin if not Node.Is_Leaf then for C in Node.Children'Range loop if Node.Children (C) /= null then Node.Children (C) := new Trie_Node'(Node.Children (C).all); end if; end loop; end if; end Adjust; overriding procedure Finalize (Node : in out Trie_Node) is begin if not Node.Is_Leaf then for C in Node.Children'Range loop Free (Node.Children (C)); end loop; end if; end Finalize; procedure Initialize (Trie : out Search_Trie; Dict : in Dictionary) is Map : Dictionary_Maps.Map; begin for I in Dict.Offsets'Range loop Dictionary_Maps.Insert (Map, Dict_Entry (Dict, I), I); end loop; Trie := (Not_Found => Natural (Dict.Dict_Last) + 1, Root => Build_Node (Map, Natural (Dict.Dict_Last) + 1)); end Initialize; function Linear_Search (Value : String) return Natural is Result : Ada.Streams.Stream_Element := 0; begin for S of List_For_Linear_Search loop exit when S = Value; Result := Result + 1; |
︙ | ︙ | |||
361 362 363 364 365 366 367 368 369 370 371 372 373 374 | if Dictionary_Maps.Has_Element (Cursor) then return Natural (Dictionary_Maps.Element (Cursor)); else return Natural (Ada.Streams.Stream_Element'Last); end if; end Map_Search; procedure Set_Dictionary_For_Map_Search (Dict : in Dictionary) is begin Dictionary_Maps.Clear (Search_Map); for I in Dict.Offsets'Range loop Dictionary_Maps.Insert (Search_Map, Dict_Entry (Dict, I), I); | > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 | if Dictionary_Maps.Has_Element (Cursor) then return Natural (Dictionary_Maps.Element (Cursor)); else return Natural (Ada.Streams.Stream_Element'Last); end if; end Map_Search; function Search (Trie : in Search_Trie; Value : in String) return Natural is Index : Positive := Value'First; Position : Trie_Node_Access; begin if Value'Length = 0 then return Trie.Not_Found; end if; Position := Trie.Root.Children (Value (Index)); loop if Position = null then return Trie.Not_Found; end if; Index := Index + 1; if Index not in Value'Range then return Position.Index; elsif Position.Is_Leaf then return Trie.Not_Found; end if; Position := Position.Children (Value (Index)); end loop; end Search; procedure Set_Dictionary_For_Map_Search (Dict : in Dictionary) is begin Dictionary_Maps.Clear (Search_Map); for I in Dict.Offsets'Range loop Dictionary_Maps.Insert (Search_Map, Dict_Entry (Dict, I), I); |
︙ | ︙ |
Modified src/natools-smaz-tools.ads from [09ce0f055f] to [b5741ecb56].
︙ | ︙ | |||
22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ------------------------------------------------------------------------------ with Ada.Containers.Indefinite_Doubly_Linked_Lists; with Natools.S_Expressions; private with Ada.Containers.Indefinite_Ordered_Maps; private with Ada.Containers.Indefinite_Ordered_Sets; package Natools.Smaz.Tools is pragma Preelaborate; package String_Lists is new Ada.Containers.Indefinite_Doubly_Linked_Lists (String); | > | 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ------------------------------------------------------------------------------ with Ada.Containers.Indefinite_Doubly_Linked_Lists; with Natools.S_Expressions; private with Ada.Containers.Indefinite_Ordered_Maps; private with Ada.Containers.Indefinite_Ordered_Sets; private with Ada.Finalization; package Natools.Smaz.Tools is pragma Preelaborate; package String_Lists is new Ada.Containers.Indefinite_Doubly_Linked_Lists (String); |
︙ | ︙ | |||
70 71 72 73 74 75 76 77 78 79 80 81 82 83 | -- that can be used with Dictionary.Hash. procedure Set_Dictionary_For_Map_Search (Dict : in Dictionary); function Map_Search (Value : String) return Natural; -- Function and data source for logarithmic search using standard -- ordered map, that can be used with Dictionary.Hash. type String_Count is range 0 .. 2 ** 31 - 1; -- Type for a number of substring occurrences type Word_Counter is private; -- Accumulate frequency/occurrence counts for a set of strings procedure Add_Word | > > > > > > | 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | -- that can be used with Dictionary.Hash. procedure Set_Dictionary_For_Map_Search (Dict : in Dictionary); function Map_Search (Value : String) return Natural; -- Function and data source for logarithmic search using standard -- ordered map, that can be used with Dictionary.Hash. type Search_Trie is private; procedure Initialize (Trie : out Search_Trie; Dict : in Dictionary); function Search (Trie : in Search_Trie; Value : in String) return Natural; -- Trie-based search in a dynamic dictionary, for lookup whose -- speed-vs-memory is even more skewed towards speed. type String_Count is range 0 .. 2 ** 31 - 1; -- Type for a number of substring occurrences type Word_Counter is private; -- Accumulate frequency/occurrence counts for a set of strings procedure Add_Word |
︙ | ︙ | |||
154 155 156 157 158 159 160 161 162 | package Scored_Word_Sets is new Ada.Containers.Indefinite_Ordered_Sets (Scored_Word); package Dictionary_Maps is new Ada.Containers.Indefinite_Ordered_Maps (String, Ada.Streams.Stream_Element); Search_Map : Dictionary_Maps.Map; end Natools.Smaz.Tools; | > > > > > > > > > > > > > > > > > > > > > > | 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | package Scored_Word_Sets is new Ada.Containers.Indefinite_Ordered_Sets (Scored_Word); package Dictionary_Maps is new Ada.Containers.Indefinite_Ordered_Maps (String, Ada.Streams.Stream_Element); Search_Map : Dictionary_Maps.Map; type Trie_Node; type Trie_Node_Access is access Trie_Node; type Trie_Node_Array is array (Character) of Trie_Node_Access; type Trie_Node (Is_Leaf : Boolean) is new Ada.Finalization.Controlled with record Index : Natural; case Is_Leaf is when True => null; when False => Children : Trie_Node_Array; end case; end record; overriding procedure Adjust (Node : in out Trie_Node); overriding procedure Finalize (Node : in out Trie_Node); type Search_Trie is record Not_Found : Natural; Root : Trie_Node (False); end record; end Natools.Smaz.Tools; |