in reply to Top five words by occurrence
split is just dicing up your input by whitespace. A '$wd =~ s/\W//g;' before your '$count{$wd}++;' will wipe out anything other than letters and numbers (probably a bad idea if you need to deal with email addresses or URLs). You also may want to '$count{lc($wd)}++;' to ignore capitalization.
Update:
and as far as just getting the 5 most common words, you can just run the output of your script through:
|sort -n|tail -n 5
Update:
and as far as just getting the 5 most common words, you can just run the output of your script through:
|sort -n|tail -n 5
|
---|
In Section
Seekers of Perl Wisdom