Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Word Frequency counter

by Anonymous Monk
on Oct 02, 2008 at 20:11 UTC ( [id://715084]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
I've been trying to put a word frequency counter together but I keep getting an unitialised value in pattern match.
my $file = "C:\\areopagitica.txt"; open(IN, $file) || die "File not found"; my @thisfile = <IN>; close(IN); chomp @thisfile; my %seen=(); while (@thisfile) { while ( /(\w['\w-]*)/g ) { $seen{lc $1}++; } } foreach my $word (sort { $seen{b} <=> $seen{a} } keys %seen) { printf "%5d %s\n", $seen{$word}, $word; }
I'd be grateful if somebody could let me know what I'm dong wrong.

Replies are listed 'Best First'.
Re: Word Frequency counter
by GrandFather (Saint) on Oct 02, 2008 at 20:30 UTC

    For a start

    while (@thisfile) {

    doesn't do what you think. In particular, it is not

    for (@thisfile) {

    Next:

    $seen{b} <=> $seen{a}

    compares the values for keys 'a' and 'b', not for the two sort variables' ($a and $b) contents as you are hoping. Cleaning those problems up, fixing a few other style issues and providing some sample data gives:

    use strict; use warnings; my $fileData = <<DATA; Greetings, I've been trying to put a word frequency counter together but I keep g +etting an unitialised value in pattern match. I'd be grateful if somebody could +let me know what I'm dong wrong. I added a line to get some repeated words. DATA my %seen; open my $inFile, '<', \$fileData; for (grep {chomp; length} <$inFile>) { $seen{lc $1}++ while /(\w['\w-]*)/g; } close ($inFile); printf "%5d %s\n", $seen{$_}, $_ for sort { $seen{$b} <=> $seen{$a} } +keys %seen;

    Prints:

    2 a 2 to 2 i 1 i've 1 know 1 put 1 if 1 unitialised 1 greetings 1 i'd 1 frequency 1 wrong 1 let 1 could 1 in 1 keep 1 line 1 repeated 1 trying 1 what 1 value 1 me 1 match 1 grateful 1 i'm 1 word 1 be 1 some 1 somebody 1 but 1 added 1 words 1 dong 1 been 1 get 1 together 1 getting 1 pattern 1 counter 1 an

    Perl reduces RSI - it saves typing
      Thanks for all the above replies which explain where I've gone wrong and why and especially Grandfather for showing a different and less verbose way of getting the task working.
Re: Word Frequency counter
by Fletch (Bishop) on Oct 02, 2008 at 20:16 UTC

    You've used literal barewords "a" and "b" as hash keys in your sort comparitor where you wanted to be using the variables $a and $b.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Word Frequency counter
by toolic (Bishop) on Oct 02, 2008 at 20:23 UTC
    Probably unrelated to your problem, but your outer while loop will be infinite if your @thisfile array has any contents. It would be better to use a for loop instead:
    for (@thisfile) {
      Actually, this is related. Anonymous' regex is trying to match against $_, but while loops (and everything else in this code) don't set $_, so it's undefined. If you change the outer while to a for then you have something that sets $_.

      Update: Complete gibberish fixed. Now it says what I meant to say

      --DrWhy

      "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

Re: Word Frequency counter
by apl (Monsignor) on Oct 02, 2008 at 20:34 UTC
    ... and, as always, you should use strict; use warnings;
Re: Word Frequency counter
by planetscape (Chancellor) on Oct 03, 2008 at 12:10 UTC
Re: Word Frequency counter
by Lawliet (Curate) on Oct 02, 2008 at 20:21 UTC

    Update: ~This is not the reply you are looking for~
    My original reply was removed due to embarrassment. (Read OP too quickly, whoops.)

    I'm so adjective, I verb nouns!

    chomp; # nom nom nom

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://715084]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2025-04-25 22:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.