http://www.perlmonks.org?node_id=475813


in reply to Top five words by occurrence

split is just dicing up your input by whitespace. A '$wd =~ s/\W//g;' before your '$count{$wd}++;' will wipe out anything other than letters and numbers (probably a bad idea if you need to deal with email addresses or URLs). You also may want to '$count{lc($wd)}++;' to ignore capitalization.

Update:

and as far as just getting the 5 most common words, you can just run the output of your script through:

|sort -n|tail -n 5