in reply to count trigrams of a whole file
A couple of improvements you can make:
- On each line, include the last two elements from the previous line in your array (if they were defined). That way you handle the overlapping cases without needing to read the whole file into memory.
- A hash is a perfect tool for keeping track of the counts. Then you can do away with your loop to search the array.
- In general, it is bad style to use the 3-argument for loop in Perl. There is almost always a better option: foreach (@array), for my $i (0..$#array), etc.
#!usr/bin/perl use strict; use warnings; my %trigrams; my @words; while(<DATA>) { #Include the previous two words to the beginning of this array. @words = ( $words[-2] // (), $words[-1] // (), split(/\s/, $_) ); $trigrams{"@words[$_..$_+2]"}++ for (0..$#words-2); } print "trigram frequencies in your text:\n"; #Sort the trigrams in descending order of frequency. for (sort {$trigrams{$b} <=> $trigrams{$a} } keys %trigrams) { print "$_: $trigrams{$_}\n"; } __DATA__ I went there! Me She also went there. Did you know that I went there!
In Section
Seekers of Perl Wisdom