http://www.perlmonks.org?node_id=1009866


in reply to Re^2: count trigrams of a whole file
in thread count trigrams of a whole file

Can you show some examples of how they are differing?

Replies are listed 'Best First'.
Re^4: count trigrams of a whole file
by lakssreedhar (Acolyte) on Dec 21, 2012 at 06:11 UTC

    For a text file
    hai! how are you?

    will you come to Canada this weekend?

    hai! Hello! I am fine.

    No I am not coming to Canada this weekend.

    I will come to Canada next week:

    I will meet you next month at Canada

    The output is

    hai!howare 6

    howareyou? 6

    areyou? 1

    areyou?will 5

    you?willyou 5

    willyoucome 5

    youcometo 5

    cometoCanada 7

    toCanadathis 8

    Canadathisweekend? 5

    thisweekend? 1

    thisweekend?hai! 4

    weekend?hai!Hello! 4

    hai!Hello!I 4

    Hello!Iam 4

    Iamfine. 4

    amfine. 1

    amfine.No 3

    fine.NoI 3

    NoIam 3

    Iamnot 3

    amnotcoming 3

    notcomingto 3

    comingtoCanada 3

    Canadathisweekend. 3

    thisweekend. 1

    thisweekend.I 2

    weekend.Iwill 2

    Iwillcome 2

    willcometo 2

    toCanadanext 2

    Canadanextweek: 2

    nextweek: 1

    nextweek:I 1

    week:Iwill 1

    Iwillmeet 1

    willmeetyou 1

    meetyounext 1

    younextmonth 1

    nextmonthat 1

    monthatCanada 1

    atCanada 1

      That's strange. I'm getting the expected output for that input:
      trigram frequencies in your text: hai!howare 1 howareyou? 1 areyou?will 1 you?willyou 1 willyoucome 1 youcometo 1 cometoCanada 2 toCanadathis 2 Canadathisweekend? 1 thisweekend?hai! 1 weekend?hai!Hello! 1 hai!Hello!I 1 Hello!Iam 1 Iamfine. 1 amfine.No 1 fine.NoI 1 NoIam 1 Iamnot 1 amnotcoming 1 notcomingto 1 comingtoCanada 1 Canadathisweekend. 1 thisweekend.I 1 weekend.Iwill 1 Iwillcome 1 willcometo 1 toCanadanext 1 Canadanextweek: 1 nextweek:I 1 week:Iwill 1 Iwillmeet 1 willmeetyou 1 meetyounext 1 younextmonth 1 nextmonthat 1 monthatCanada 1
      This is from running your code with the minor change I suggested:
      #!/usr/bin/env perl use strict; use warnings; use autodie; use feature 'say'; my @trigrams; my @trigramfrequency; my @words; while (<DATA>) { push @words, split /\s/; } for ( my $i = 0 ; $i < $#words - 1 ; $i++ ) { my $trigram = $words[$i] . $words[ $i + 1 ] . $words[ $i + 2 ]; my $found = -1; if (@trigrams) { SEARCHtrigramINDEX: for ( my $index = 0 ; $index <= $#trigrams ; $index++ ) { if ( $trigrams[$index] eq $trigram ) { $found = $index; last SEARCHtrigramINDEX; } } } if ( $found > -1 ) { $trigramfrequency[$found]++; } else { push @trigrams, $trigram; $trigramfrequency[$#trigrams]++; } } print "trigram frequencies in your text:\n"; for ( my $index = 0 ; $index <= @trigrams ; $index++ ) { print "$trigrams[$index] $trigramfrequency[$index]\n"; } __DATA__ hai! how are you? will you come to Canada this weekend? hai! Hello! I am fine. No I am not coming to Canada this weekend. I will come to Canada next week: I will meet you next month at Canada

        explain the code