Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Golf: Count unique words

by Anonymous Monk
on Nov 30, 2004 at 14:38 UTC ( #411197=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm pretty new to perl (hmm, maybe the obfuscated section isn't the best place for beginners...) and really love the regexs and the fact so little code can do so much. So I was particularly impressed when my first attempt at a small but nearly useful program worked (esp for Bram Stokers Dracula in 1.6 seconds):

   s/\b(\w+)\b/$w{$1}++;$t++;$1;/eg while (<>);
   print "$_ ($w{$_})\n" for (sort keys %w);
   print "\n$t\n";

Having seen how fast it works, and how small it is, I started wondering if it could be made any smaller. Not necessarily obfuscated, but just compacted a bit.

So, can it?

Comment on Golf: Count unique words
Re: Golf: Count unique words
by Fletch (Chancellor) on Nov 30, 2004 at 14:58 UTC

    One might wonder why you're using s/// to replace something with itself rather than just using nested while loops and a m// instead.

    while( <> ) { while( /\b(\w+)\b/g ) { $w{$1}++; $t++; } }

    Update: Duh, you're going for obfuscated not efficiency. Never mind me.

      It's also golfier ... try golfing my reply down without using s///eg ...

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Golf: Count unique words
by dragonchild (Archbishop) on Nov 30, 2004 at 15:02 UTC
    So, given a list of filenames as input, you want to
    1. read them in
    2. find all the words (as determined by /\b\w+\b/ ... there are other ways)
    3. print out of all the unique words and how often they appeared
    4. print out how many words there were total

    Right? Ok ... I come in at 81 characters. 78 for the code and 3 for the switches.

    #!perl -nl END{print"$_ ($w{$_})"for sort keys%w;print"\n$t"}s/\b(\w+)\b/$w{$1}++ +;$t++/eg

    It's basically a rewrite of your code, with a few enhancements.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      73 (70 + 3 for the switches). The \b's are pretty much unnecessary and using $& means you don't need parens.

      #!/usr/bin/perl -nl END{print map("$_ ($w{$_})\n",sort keys%w),$t}s/\w+/$w{$&}++;$t++/eg;

        Sh?aving seven strokes:

        #!/usr/bin/perl -lp ++$t,++$w{$_}for/\w+/g}for(map("$_ ($w{$_})",sort keys%w),$t){
Re: Golf: Count unique words
by pearl (Initiate) on Nov 30, 2004 at 15:07 UTC

    Looks like you checking for the frequency of words in a file. In that case, the 2 \b in your regex aren't really necessary.

    s/(\w+)/$w{$1}++;$t++;$1;/eg while (<>);
Re: Golf: Count unique words
by !1 (Hermit) on Nov 30, 2004 at 20:40 UTC

    Just throwing this one out there for fun and to show a different approach:

    #!/usr/bin/perl -nla INIT{undef$/};$w{$_}++for@F;print"$_ ($w{$_})"for keys%w;print"".@F # 345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345678 6 234567

    67 + 4 = 71 strokes

      #!/usr/bin/perl -0nla $w{$_}++for@F;print"$_ ($w{$_}) "for keys%w;print~~@F small correction :)
        #!/usr/bin/perl -0na
        $w{$_}++for@F;print"$_ ($w{$_})
        "for keys%w;print~~@F

        there is literal \n at the end of second line

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://411197]
Approved by neniro
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (17)
As of 2014-07-30 13:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (234 votes), past polls