http://www.perlmonks.org?node_id=1009872


in reply to Re^3: count trigrams of a whole file
in thread count trigrams of a whole file

For a text file
hai! how are you?

will you come to Canada this weekend?

hai! Hello! I am fine.

No I am not coming to Canada this weekend.

I will come to Canada next week:

I will meet you next month at Canada

The output is

hai!howare 6

howareyou? 6

areyou? 1

areyou?will 5

you?willyou 5

willyoucome 5

youcometo 5

cometoCanada 7

toCanadathis 8

Canadathisweekend? 5

thisweekend? 1

thisweekend?hai! 4

weekend?hai!Hello! 4

hai!Hello!I 4

Hello!Iam 4

Iamfine. 4

amfine. 1

amfine.No 3

fine.NoI 3

NoIam 3

Iamnot 3

amnotcoming 3

notcomingto 3

comingtoCanada 3

Canadathisweekend. 3

thisweekend. 1

thisweekend.I 2

weekend.Iwill 2

Iwillcome 2

willcometo 2

toCanadanext 2

Canadanextweek: 2

nextweek: 1

nextweek:I 1

week:Iwill 1

Iwillmeet 1

willmeetyou 1

meetyounext 1

younextmonth 1

nextmonthat 1

monthatCanada 1

atCanada 1

Replies are listed 'Best First'.
Re^5: count trigrams of a whole file
by frozenwithjoy (Priest) on Dec 21, 2012 at 20:26 UTC
    That's strange. I'm getting the expected output for that input:
    trigram frequencies in your text: hai!howare 1 howareyou? 1 areyou?will 1 you?willyou 1 willyoucome 1 youcometo 1 cometoCanada 2 toCanadathis 2 Canadathisweekend? 1 thisweekend?hai! 1 weekend?hai!Hello! 1 hai!Hello!I 1 Hello!Iam 1 Iamfine. 1 amfine.No 1 fine.NoI 1 NoIam 1 Iamnot 1 amnotcoming 1 notcomingto 1 comingtoCanada 1 Canadathisweekend. 1 thisweekend.I 1 weekend.Iwill 1 Iwillcome 1 willcometo 1 toCanadanext 1 Canadanextweek: 1 nextweek:I 1 week:Iwill 1 Iwillmeet 1 willmeetyou 1 meetyounext 1 younextmonth 1 nextmonthat 1 monthatCanada 1
    This is from running your code with the minor change I suggested:
    #!/usr/bin/env perl use strict; use warnings; use autodie; use feature 'say'; my @trigrams; my @trigramfrequency; my @words; while (<DATA>) { push @words, split /\s/; } for ( my $i = 0 ; $i < $#words - 1 ; $i++ ) { my $trigram = $words[$i] . $words[ $i + 1 ] . $words[ $i + 2 ]; my $found = -1; if (@trigrams) { SEARCHtrigramINDEX: for ( my $index = 0 ; $index <= $#trigrams ; $index++ ) { if ( $trigrams[$index] eq $trigram ) { $found = $index; last SEARCHtrigramINDEX; } } } if ( $found > -1 ) { $trigramfrequency[$found]++; } else { push @trigrams, $trigram; $trigramfrequency[$#trigrams]++; } } print "trigram frequencies in your text:\n"; for ( my $index = 0 ; $index <= @trigrams ; $index++ ) { print "$trigrams[$index] $trigramfrequency[$index]\n"; } __DATA__ hai! how are you? will you come to Canada this weekend? hai! Hello! I am fine. No I am not coming to Canada this weekend. I will come to Canada next week: I will meet you next month at Canada

      explain the code