Overview
Comment: | smaz-tools: new primitive to add all the words of an input text |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
efffae966f3274d7d731e90e8ad75189 |
User & Date: | nat on 2016-10-02 16:10:17 |
Other Links: | manifest | tags |
Context
2016-10-03
| ||
19:27 | tools/smaz: also add words from input phrases, on top of substrings check-in: 15ea367b55 user: nat tags: trunk | |
2016-10-02
| ||
16:10 | smaz-tools: new primitive to add all the words of an input text check-in: efffae966f user: nat tags: trunk | |
2016-10-01
| ||
15:25 | tools/smaz: add a command-line option to output current dictionary check-in: 1516f5a576 user: nat tags: trunk | |
Changes
Modified src/natools-smaz-tools.adb from [99ea3f33bb] to [a336f10fe0].
︙ | ︙ | |||
391 392 393 394 395 396 397 398 399 400 401 402 403 404 | if Word_Maps.Has_Element (Cursor) then Word_Maps.Update_Element (Counter.Map, Cursor, Update'Access); else Word_Maps.Insert (Counter.Map, Word, Count); end if; end Add_Word; function Simple_Dictionary (Counter : in Word_Counter; Word_Count : in Natural) return String_Lists.List is use type Ada.Containers.Count_Type; | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 | if Word_Maps.Has_Element (Cursor) then Word_Maps.Update_Element (Counter.Map, Cursor, Update'Access); else Word_Maps.Insert (Counter.Map, Word, Count); end if; end Add_Word; procedure Add_Words (Counter : in out Word_Counter; Phrase : in String; Min_Size : in Positive; Max_Size : in Positive) is subtype Word_Part is Character with Static_Predicate => Word_Part in '0' .. '9' | 'A' .. 'Z' | 'a' .. 'z' | Character'Val (128) .. Character'Val (255); I, First, Next : Positive; begin if Max_Size < Min_Size then return; end if; I := Phrase'First; Main_Loop : while I in Phrase'Range loop Skip_Non_Word : while I in Phrase'Range and then Phrase (I) not in Word_Part loop I := I + 1; end loop Skip_Non_Word; exit Main_Loop when I not in Phrase'Range; First := I; Skip_Word : while I in Phrase'Range and then Phrase (I) in Word_Part loop I := I + 1; end loop Skip_Word; Next := I; if Next - First in Min_Size .. Max_Size then Add_Word (Counter, Phrase (First .. Next - 1)); end if; end loop Main_Loop; end Add_Words; function Simple_Dictionary (Counter : in Word_Counter; Word_Count : in Natural) return String_Lists.List is use type Ada.Containers.Count_Type; |
︙ | ︙ |
Modified src/natools-smaz-tools.ads from [1f1e29a7d3] to [a92069b766].
︙ | ︙ | |||
85 86 87 88 89 90 91 92 93 94 95 96 97 98 | (Counter : in out Word_Counter; Phrase : in String; Min_Size : in Positive; Max_Size : in Positive); -- Include all the substrings of Phrase whose lengths are -- between Min_Size and Max_Size. function Simple_Dictionary (Counter : in Word_Counter; Word_Count : in Natural) return String_Lists.List; -- Return the Word_Count words in Counter that have the highest score, -- the score being count * length. | > > > > > > > > > | 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | (Counter : in out Word_Counter; Phrase : in String; Min_Size : in Positive; Max_Size : in Positive); -- Include all the substrings of Phrase whose lengths are -- between Min_Size and Max_Size. procedure Add_Words (Counter : in out Word_Counter; Phrase : in String; Min_Size : in Positive; Max_Size : in Positive); -- Add the "words" from Phrase into Counter, with a word being currently -- defined as anything between ASCII blanks or punctuation, -- or in other words [0-9A-Za-z\x80-\xFF]+ function Simple_Dictionary (Counter : in Word_Counter; Word_Count : in Natural) return String_Lists.List; -- Return the Word_Count words in Counter that have the highest score, -- the score being count * length. |
︙ | ︙ |