Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

How can I print out a word-frequency or line-frequency summary?

by faq_monk (Initiate)
on Oct 08, 1999 at 00:25 UTC ( #669=perlfaq nodetype: print w/ replies, xml ) Need Help??

Current Perl documentation can be found at perldoc.perl.org.

Here is our local, out-dated (pre-5.6) version:

To do this, you have to parse out each word in the input stream. We'll pretend that by word you mean chunk of alphabetics, hyphens, or apostrophes, rather than the non-whitespace chunk idea of a word given in the previous question:

    while (<>) {
        while ( /(\b[^\W_\d][\w'-]+\b)/g ) {   # misses "`sheep'"
            $seen{$1}++;
        }
    }
    while ( ($word, $count) = each %seen ) {
        print "$count $word\n";
    }

If you wanted to do the same thing for lines, you wouldn't need a regular expression:

    while (<>) { 
        $seen{$_}++;
    }
    while ( ($line, $count) = each %seen ) {
        print "$count $line";
    }

If you want these output in a sorted order, see the section on Hashes.

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2015-07-31 00:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls