Overview
Comment: | smaz-tools: import scoring code from tools/smaz |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
5822fc18e00a158293fcd28423d77aee |
User & Date: | nat on 2016-11-07 21:11:10 |
Other Links: | manifest | tags |
Context
2016-11-08
| ||
20:44 | tools/smaz: use scoring from Natools.Smaz.Tools check-in: 99442da1d7 user: nat tags: trunk | |
2016-11-07
| ||
21:11 | smaz-tools: import scoring code from tools/smaz check-in: 5822fc18e0 user: nat tags: trunk | |
2016-11-06
| ||
20:42 | tools/smaz: add options to select variable-length verbatim codes check-in: 2365190245 user: nat tags: trunk | |
Changes
Modified src/natools-smaz-tools.adb from [f5a81251e0] to [40f74c4b4e].
︙ | ︙ | |||
839 840 841 842 843 844 845 846 | begin return Scored_Word' (Size => Word'Length, Word => Word, Score => Score_Value (Word_Maps.Element (Cursor)) * Word'Length); end To_Scored_Word; end Natools.Smaz.Tools; | > > > > > > > > > > > > > > > > > > > > > > > | 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 | begin return Scored_Word' (Size => Word'Length, Word => Word, Score => Score_Value (Word_Maps.Element (Cursor)) * Word'Length); end To_Scored_Word; function Worst_Index (Dict : in Dictionary; Counts : in Dictionary_Counts; Method : in Methods.Enum) return Ada.Streams.Stream_Element is Result : Ada.Streams.Stream_Element := 0; Worst_Score : Score_Value := Score_Encoded (Dict, Counts, 0); S : Score_Value; begin for I in 1 .. Dict.Dict_Last loop S := Score (Dict, Counts, I, Method); if S < Worst_Score then Result := I; Worst_Score := S; end if; end loop; return Result; end Worst_Index; end Natools.Smaz.Tools; |
Modified src/natools-smaz-tools.ads from [be69a43d0e] to [716677a420].
︙ | ︙ | |||
116 117 118 119 120 121 122 123 124 125 126 127 128 129 | function Trie_Search (Value : String) return Natural; -- Function and data source for trie-based search that can be -- used with Dictionary.Hash. type String_Count is range 0 .. 2 ** 31 - 1; -- Type for a number of substring occurrences type Word_Counter is private; -- Accumulate frequency/occurrence counts for a set of strings procedure Add_Word (Counter : in out Word_Counter; Word : in String; Count : in String_Count := 1); | > > > > > | 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | function Trie_Search (Value : String) return Natural; -- Function and data source for trie-based search that can be -- used with Dictionary.Hash. type String_Count is range 0 .. 2 ** 31 - 1; -- Type for a number of substring occurrences package Methods is type Enum is (Encoded, Frequency, Gain); end Methods; -- Evaluation methods to select words to remove or include type Word_Counter is private; -- Accumulate frequency/occurrence counts for a set of strings procedure Add_Word (Counter : in out Word_Counter; Word : in String; Count : in String_Count := 1); |
︙ | ︙ | |||
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | Corpus_Entry : in String; Compressed_Size : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts); -- Compress all strings of Corpus, returning the total number of -- compressed bytes and the number of uses for each dictionary -- element. private package Word_Maps is new Ada.Containers.Indefinite_Ordered_Maps (String, String_Count); type Word_Counter is record Map : Word_Maps.Map; end record; | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < < | 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | Corpus_Entry : in String; Compressed_Size : in out Ada.Streams.Stream_Element_Count; Counts : in out Dictionary_Counts); -- Compress all strings of Corpus, returning the total number of -- compressed bytes and the number of uses for each dictionary -- element. function Worst_Index (Dict : in Dictionary; Counts : in Dictionary_Counts; Method : in Methods.Enum) return Ada.Streams.Stream_Element; -- Return the element with worst score type Score_Value is range 0 .. 2 ** 31 - 1; function Length (Dict : in Dictionary; E : in Ada.Streams.Stream_Element) return Score_Value is (Natools.Smaz.Dict_Entry (Dict, E)'Length); -- Length of a dictionary entry function Score_Encoded (Dict : in Dictionary; Counts : in Natools.Smaz.Tools.Dictionary_Counts; E : Ada.Streams.Stream_Element) return Score_Value is (Score_Value (Counts (E)) * Length (Dict, E)); -- Score value using the amount of encoded data using E function Score_Frequency (Dict : in Dictionary; Counts : in Natools.Smaz.Tools.Dictionary_Counts; E : Ada.Streams.Stream_Element) return Score_Value is (Score_Value (Counts (E))); -- Score value using the number of times E was used function Score_Gain (Dict : in Dictionary; Counts : in Natools.Smaz.Tools.Dictionary_Counts; E : Ada.Streams.Stream_Element) return Score_Value is (Score_Value (Counts (E)) * (Length (Dict, E) - 1)); -- Score value using the number of bytes saved using E function Score (Dict : in Dictionary; Counts : in Natools.Smaz.Tools.Dictionary_Counts; E : in Ada.Streams.Stream_Element; Method : in Methods.Enum) return Score_Value is (case Method is when Methods.Encoded => Score_Encoded (Dict, Counts, E), when Methods.Frequency => Score_Frequency (Dict, Counts, E), when Methods.Gain => Score_Gain (Dict, Counts, E)); -- Scare value with dynamically chosen method private package Word_Maps is new Ada.Containers.Indefinite_Ordered_Maps (String, String_Count); type Word_Counter is record Map : Word_Maps.Map; end record; type Scored_Word (Size : Natural) is record Word : String (1 .. Size); Score : Score_Value; end record; function "<" (Left, Right : Scored_Word) return Boolean is (Left.Score > Right.Score |
︙ | ︙ |