Overview
Comment: | tools/smaz: genericize Process |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
cfdc0a797926c8a9688a9a474e923fa7 |
User & Date: | nat on 2016-12-08 22:29:32 |
Other Links: | manifest | tags |
Context
2016-12-09
| ||
21:06 | tools/smaz: new command line option to select the old implementation check-in: 0f8f66819b user: nat tags: trunk | |
2016-12-08
| ||
22:29 | tools/smaz: genericize Process check-in: cfdc0a7979 user: nat tags: trunk | |
2016-12-07
| ||
21:36 | tools/smaz: partially genericize Print_Dictionary check-in: 966f7e5239 user: nat tags: trunk | |
Changes
Modified tools/smaz.adb from [c5aac94447] to [6785c59b6a].
︙ | ︙ | |||
137 138 139 140 141 142 143 | -- print the given dictionary in the given file procedure Print_Help (Opt : in Getopt.Configuration; Output : in Ada.Text_IO.File_Type); -- Print the help text to the given file | < < < < < < > | 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | -- print the given dictionary in the given file procedure Print_Help (Opt : in Getopt.Configuration; Output : in Ada.Text_IO.File_Type); -- Print the help text to the given file procedure Use_Dictionary (Dict : in out Natools.Smaz_256.Dictionary); -- Update Dictionary.Hash so that it can be actually used generic type Dictionary (<>) is private; type Dictionary_Entry is (<>); type Methods is (<>); type Score_Value is range <>; type String_Count is range <>; type Word_Counter is private; type Dictionary_Counts is array (Dictionary_Entry) of String_Count; with package String_Lists is new Ada.Containers.Indefinite_Doubly_Linked_Lists (String); |
︙ | ︙ | |||
176 177 178 179 180 181 182 183 184 185 186 187 188 189 | Min_Size : in Positive; Max_Size : in Positive); with function Append_String (Dict : in Dictionary; Element : in String) return Dictionary; with function Dict_Entry (Dict : in Dictionary; Element : in Dictionary_Entry) return String; with procedure Evaluate_Dictionary | > > > > > > > > > > > > > > | 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | Min_Size : in Positive; Max_Size : in Positive); with function Append_String (Dict : in Dictionary; Element : in String) return Dictionary; with procedure Build_Perfect_Hash (Word_List : in String_Lists.List; Package_Name : in String); with function Compress (Dict : in Dictionary; Input : in String) return Ada.Streams.Stream_Element_Array; with function Decompress (Dict : in Dictionary; Input : in Ada.Streams.Stream_Element_Array) return String; with function Dict_Entry (Dict : in Dictionary; Element : in Dictionary_Entry) return String; with procedure Evaluate_Dictionary |
︙ | ︙ | |||
210 211 212 213 214 215 216 217 218 219 220 221 222 223 | Hash_Package_Name : in String := "") is <>; with function Remove_Element (Dict : in Dictionary; Element : in Dictionary_Entry) return Dictionary; with function Simple_Dictionary (Counter : in Word_Counter; Word_Count : in Natural; Method : in Methods) return String_Lists.List; | > > > > > > | 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | Hash_Package_Name : in String := "") is <>; with function Remove_Element (Dict : in Dictionary; Element : in Dictionary_Entry) return Dictionary; Score_Encoded, Score_Frequency, Score_Gain : in access function (D : in Dictionary; C : in Dictionary_Counts; E : in Dictionary_Entry) return Score_Value; with function Simple_Dictionary (Counter : in Word_Counter; Word_Count : in Natural; Method : in Methods) return String_Lists.List; |
︙ | ︙ | |||
251 252 253 254 255 256 257 258 259 260 261 262 263 264 | Dict : in Dictionary; Corpus : in String_Lists.List; Compressed_Size : out Ada.Streams.Stream_Element_Count; Counts : out Dictionary_Counts); -- Dispatch to parallel or non-parallel version of -- Evaluate_Dictionary depending on Job_Count. procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; Job_Count : in Natural; | > > > > > > | 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | Dict : in Dictionary; Corpus : in String_Lists.List; Compressed_Size : out Ada.Streams.Stream_Element_Count; Counts : out Dictionary_Counts); -- Dispatch to parallel or non-parallel version of -- Evaluate_Dictionary depending on Job_Count. function Image (Dict : in Dictionary; Code : in Dictionary_Entry) return Natools.S_Expressions.Atom; -- S-expression image of Code procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; Job_Count : in Natural; |
︙ | ︙ | |||
288 289 290 291 292 293 294 295 296 297 298 299 300 301 | procedure Print_Dictionary (Filename : in String; Dict : in Dictionary; Hash_Package_Name : in String := ""); -- print the given dictionary in the given file function To_Dictionary (Handler : in Callback'Class; Input : in String_Lists.List; Method : in Methods) return Dictionary; -- Convert the input into a dictionary given the option in Handler | > > > > > > > | 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 | procedure Print_Dictionary (Filename : in String; Dict : in Dictionary; Hash_Package_Name : in String := ""); -- print the given dictionary in the given file procedure Process (Handler : in Callback'Class; Word_List : in String_Lists.List; Data_List : in String_Lists.List; Method : in Methods); -- Perform the requested operations function To_Dictionary (Handler : in Callback'Class; Input : in String_Lists.List; Method : in Methods) return Dictionary; -- Convert the input into a dictionary given the option in Handler |
︙ | ︙ | |||
321 322 323 324 325 326 327 328 329 330 331 332 333 334 | Actual_Dict, Corpus, Compressed_Size, Counts); else Evaluate_Dictionary (Actual_Dict, Corpus, Compressed_Size, Counts); end if; end Evaluate_Dictionary; procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; | > > > > > > > > > | 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | Actual_Dict, Corpus, Compressed_Size, Counts); else Evaluate_Dictionary (Actual_Dict, Corpus, Compressed_Size, Counts); end if; end Evaluate_Dictionary; function Image (Dict : in Dictionary; Code : in Dictionary_Entry) return Natools.S_Expressions.Atom is begin return Compress (Dict, Dict_Entry (Dict, Code)); end Image; procedure Optimization_Round (Dict : in out Holders.Holder; Score : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts; Pending_Words : in out String_Lists.List; Input_Texts : in String_Lists.List; |
︙ | ︙ | |||
531 532 533 534 535 536 537 538 539 540 541 542 543 544 | Ada.Text_IO.Create (File, Name => Filename); Print_Dictionary (File, Dict, Hash_Package_Name); Ada.Text_IO.Close (File); end; end if; end Print_Dictionary; function To_Dictionary (Handler : in Callback'Class; Input : in String_Lists.List; Method : in Methods) return Dictionary is | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 | Ada.Text_IO.Create (File, Name => Filename); Print_Dictionary (File, Dict, Hash_Package_Name); Ada.Text_IO.Close (File); end; end if; end Print_Dictionary; procedure Process (Handler : in Callback'Class; Word_List : in String_Lists.List; Data_List : in String_Lists.List; Method : in Methods) is Dict : Dictionary := To_Dictionary (Handler, Word_List, Method); Sx_Output : Natools.S_Expressions.Printers.Canonical (Ada.Text_IO.Text_Streams.Stream (Ada.Text_IO.Current_Output)); Ada_Dictionary : constant String := Ada.Strings.Unbounded.To_String (Handler.Ada_Dictionary); Hash_Package : constant String := Ada.Strings.Unbounded.To_String (Handler.Hash_Package); begin Use_Dictionary (Dict); if Ada_Dictionary'Length > 0 then Print_Dictionary (Ada_Dictionary, Dict, Hash_Package); end if; if Hash_Package'Length > 0 then Build_Perfect_Hash (Word_List, Hash_Package); end if; if Handler.Sx_Dict_Output then Sx_Output.Open_List; for I in Dictionary_Entry'First .. Last_Code (Dict) loop Sx_Output.Append_String (Dict_Entry (Dict, I)); end loop; Sx_Output.Close_List; end if; case Handler.Action is when Actions.Nothing => null; when Actions.Decode => if Handler.Sx_Output then Sx_Output.Open_List; for S of Data_List loop Sx_Output.Append_String (Decompress (Dict, To_SEA (S))); end loop; Sx_Output.Close_List; end if; if Handler.Stat_Output then declare procedure Print_Line (Original, Output : Natural); procedure Print_Line (Original, Output : Natural) is begin Ada.Text_IO.Put_Line (Natural'Image (Original) & Ada.Characters.Latin_1.HT & Natural'Image (Output) & Ada.Characters.Latin_1.HT & Float'Image (Float (Original) / Float (Output))); end Print_Line; Original_Total : Natural := 0; Output_Total : Natural := 0; begin for S of Data_List loop declare Original_Size : constant Natural := S'Length; Output_Size : constant Natural := Decompress (Dict, To_SEA (S))'Length; begin Print_Line (Original_Size, Output_Size); Original_Total := Original_Total + Original_Size; Output_Total := Output_Total + Output_Size; end; end loop; Print_Line (Original_Total, Output_Total); end; end if; when Actions.Encode => if Handler.Sx_Output then Sx_Output.Open_List; for S of Data_List loop Sx_Output.Append_Atom (Compress (Dict, S)); end loop; Sx_Output.Close_List; end if; if Handler.Stat_Output then declare procedure Print_Line (Original, Output, Base64 : Natural); procedure Print_Line (Original, Output, Base64 : in Natural) is begin Ada.Text_IO.Put_Line (Natural'Image (Original) & Ada.Characters.Latin_1.HT & Natural'Image (Output) & Ada.Characters.Latin_1.HT & Natural'Image (Base64) & Ada.Characters.Latin_1.HT & Float'Image (Float (Output) / Float (Original)) & Ada.Characters.Latin_1.HT & Float'Image (Float (Base64) / Float (Original))); end Print_Line; Original_Total : Natural := 0; Output_Total : Natural := 0; Base64_Total : Natural := 0; begin for S of Data_List loop declare Original_Size : constant Natural := S'Length; Output_Size : constant Natural := Compress (Dict, S)'Length; Base64_Size : constant Natural := ((Output_Size + 2) / 3) * 4; begin Print_Line (Original_Size, Output_Size, Base64_Size); Original_Total := Original_Total + Original_Size; Output_Total := Output_Total + Output_Size; Base64_Total := Base64_Total + Base64_Size; end; end loop; Print_Line (Original_Total, Output_Total, Base64_Total); end; end if; when Actions.Evaluate => declare Total_Size : Ada.Streams.Stream_Element_Count; Counts : Dictionary_Counts; begin Evaluate_Dictionary (Handler.Job_Count, Dict, Data_List, Total_Size, Counts); if Handler.Sx_Output then Sx_Output.Open_List; Sx_Output.Append_String (Ada.Strings.Fixed.Trim (Ada.Streams.Stream_Element_Count'Image (Total_Size), Ada.Strings.Both)); for E in Dictionary_Entry'First .. Last_Code (Dict) loop Sx_Output.Open_List; Sx_Output.Append_Atom (Image (Dict, E)); Sx_Output.Append_String (Dict_Entry (Dict, E)); Sx_Output.Append_String (Ada.Strings.Fixed.Trim (String_Count'Image (Counts (E)), Ada.Strings.Both)); Sx_Output.Close_List; end loop; Sx_Output.Close_List; end if; if Handler.Stat_Output then declare procedure Print (Label : in String; E : in Dictionary_Entry; Score : in Score_Value); procedure Print_Min_Max (Label : in String; Score : not null access function (D : in Dictionary; C : in Dictionary_Counts; E : in Dictionary_Entry) return Score_Value); procedure Print_Value (Label : in String; Score : not null access function (D : in Dictionary; C : in Dictionary_Counts; E : in Dictionary_Entry) return Score_Value; Ref : in Score_Value); procedure Print (Label : in String; E : in Dictionary_Entry; Score : in Score_Value) is begin if Handler.Sx_Output then Sx_Output.Open_List; Sx_Output.Append_Atom (Image (Dict, E)); Sx_Output.Append_String (Dict_Entry (Dict, E)); Sx_Output.Append_String (Ada.Strings.Fixed.Trim (Score'Img, Ada.Strings.Both)); Sx_Output.Close_List; else Ada.Text_IO.Put_Line (Label & Ada.Characters.Latin_1.HT & Dictionary_Entry'Image (E) & Ada.Characters.Latin_1.HT & Natools.String_Escapes.C_Escape_Hex (Dict_Entry (Dict, E), True) & Ada.Characters.Latin_1.HT & Score'Img); end if; end Print; procedure Print_Min_Max (Label : in String; Score : not null access function (D : in Dictionary; C : in Dictionary_Counts; E : in Dictionary_Entry) return Score_Value) is Min_Score, Max_Score : Score_Value := Score (Dict, Counts, Dictionary_Entry'First); S : Score_Value; begin for E in Dictionary_Entry'Succ (Dictionary_Entry'First) .. Last_Code (Dict) loop S := Score (Dict, Counts, E); if S < Min_Score then Min_Score := S; end if; if S > Max_Score then Max_Score := S; end if; end loop; Print_Value ("best-" & Label, Score, Max_Score); Print_Value ("worst-" & Label, Score, Min_Score); end Print_Min_Max; procedure Print_Value (Label : in String; Score : not null access function (D : in Dictionary; C : in Dictionary_Counts; E : in Dictionary_Entry) return Score_Value; Ref : in Score_Value) is begin if Handler.Sx_Output then Sx_Output.Open_List; Sx_Output.Append_String (Label); end if; for E in Dictionary_Entry'First .. Last_Code (Dict) loop if Score (Dict, Counts, E) = Ref then Print (Label, E, Ref); end if; end loop; if Handler.Sx_Output then Sx_Output.Close_List; end if; end Print_Value; begin Print_Min_Max ("encoded", Score_Encoded); Print_Min_Max ("frequency", Score_Frequency); Print_Min_Max ("gain", Score_Gain); end; end if; end; end case; end Process; function To_Dictionary (Handler : in Callback'Class; Input : in String_Lists.List; Method : in Methods) return Dictionary is |
︙ | ︙ | |||
602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 | package Dict_256 is new Dictionary_Subprograms (Dictionary => Natools.Smaz_256.Dictionary, Dictionary_Entry => Ada.Streams.Stream_Element, Methods => Natools.Smaz_Tools.Methods.Enum, String_Count => Natools.Smaz_Tools.String_Count, Word_Counter => Natools.Smaz_Tools.Word_Counter, Dictionary_Counts => Tools_256.Dictionary_Counts, String_Lists => Natools.Smaz_Tools.String_Lists, Add_Substrings => Natools.Smaz_Tools.Add_Substrings, Add_Words => Natools.Smaz_Tools.Add_Words, Append_String => Tools_256.Append_String, Dict_Entry => Natools.Smaz_256.Dict_Entry, Evaluate_Dictionary => Tools_256.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_256.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_256.Remove_Element, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_256.To_Dictionary, Worst_Element => Tools_256.Worst_Index); | > > > > > > > | 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 | package Dict_256 is new Dictionary_Subprograms (Dictionary => Natools.Smaz_256.Dictionary, Dictionary_Entry => Ada.Streams.Stream_Element, Methods => Natools.Smaz_Tools.Methods.Enum, Score_Value => Natools.Smaz_Tools.Score_Value, String_Count => Natools.Smaz_Tools.String_Count, Word_Counter => Natools.Smaz_Tools.Word_Counter, Dictionary_Counts => Tools_256.Dictionary_Counts, String_Lists => Natools.Smaz_Tools.String_Lists, Add_Substrings => Natools.Smaz_Tools.Add_Substrings, Add_Words => Natools.Smaz_Tools.Add_Words, Append_String => Tools_256.Append_String, Build_Perfect_Hash => Natools.Smaz_Tools.GNAT.Build_Perfect_Hash, Compress => Natools.Smaz_256.Compress, Decompress => Natools.Smaz_256.Decompress, Dict_Entry => Natools.Smaz_256.Dict_Entry, Evaluate_Dictionary => Tools_256.Evaluate_Dictionary, Evaluate_Dictionary_Partial => Tools_256.Evaluate_Dictionary_Partial, Filter_By_Count => Natools.Smaz_Tools.Filter_By_Count, Last_Code => Last_Code, Remove_Element => Tools_256.Remove_Element, Score_Encoded => Tools_256.Score_Encoded'Access, Score_Frequency => Tools_256.Score_Frequency'Access, Score_Gain => Tools_256.Score_Gain'Access, Simple_Dictionary => Natools.Smaz_Tools.Simple_Dictionary, Simple_Dictionary_And_Pending => Natools.Smaz_Tools.Simple_Dictionary_And_Pending, To_Dictionary => Tools_256.To_Dictionary, Worst_Element => Tools_256.Worst_Index); |
︙ | ︙ | |||
924 925 926 927 928 929 930 | New_Line (Output); Put_Line (Output, Indent & Indent & "Disable variable-length verbatim in built dictionary"); end case; end loop; end Print_Help; | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 | New_Line (Output); Put_Line (Output, Indent & Indent & "Disable variable-length verbatim in built dictionary"); end case; end loop; end Print_Help; procedure Use_Dictionary (Dict : in out Natools.Smaz_256.Dictionary) is begin Natools.Smaz_Tools.Set_Dictionary_For_Trie_Search (Tools_256.To_String_List (Dict)); Dict.Hash := Natools.Smaz_Tools.Trie_Search'Access; |
︙ | ︙ | |||
1266 1267 1268 1269 1270 1271 1272 | if Handler.Action /= Actions.Nothing then Parser.Next; Natools.Smaz_Tools.Read_List (Input_Data, Parser); end if; end Read_Input_List; | | | 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 | if Handler.Action /= Actions.Nothing then Parser.Next; Natools.Smaz_Tools.Read_List (Input_Data, Parser); end if; end Read_Input_List; Dict_256.Process (Handler, Input_List, Input_Data, Handler.Score_Method); end Smaz; |