The stupid question is the question not asked | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
split is just dicing up your input by whitespace. A '$wd =~ s/\W//g;' before your '$count{$wd}++;' will wipe out anything other than letters and numbers (probably a bad idea if you need to deal with email addresses or URLs). You also may want to '$count{lc($wd)}++;' to ignore capitalization.
Update: and as far as just getting the 5 most common words, you can just run the output of your script through: |sort -n|tail -n 5 In reply to Re: Top five words by occurrence
by socketdave
|
|