Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Perl Destroys Interview Question

by redsquirrel (Hermit)
on Jan 12, 2004 at 22:36 UTC ( #320795=perlmeditation: print w/ replies, xml ) Need Help??

I'm in the process of interviewing with a small, Java-centric tech company. They recently asked me to solve a programming problem in Java. The problem involved reading input, calculating a case-insensitive word count, and then outputting the details of the word count. My Perl solution looked like this:
#!/usr/bin/perl %words = (); while ( <STDIN> ) { chomp; $words{ lc( $_ ) } ++; } foreach my $word ( sort keys %words ) { print "$word;$words{ $word }\n"; }
My Java equivalent had 5x the lines and 10x the characters. I submitted both solutions, pointing out that the problem was much more suited for Perl than Java. I have my second interview next Monday. :-)

Comment on Perl Destroys Interview Question
Download Code
Re: Perl Destroys Interview Question
by LAI (Hermit) on Jan 12, 2004 at 22:53 UTC

    Well done, redsquirrel. This seems to point out what most Java-Perl holy wars miss: that for certain applications Perl is far more useful than Java. (and, by extension, vice-versa.)

    LAI

    __END__
Re: Perl Destroys Interview Question
by mr_mischief (Monsignor) on Jan 12, 2004 at 22:56 UTC
    This doesn't count words in a file. It almost counts unique lines in a file. What it actually does is list each unique line in a file and the number of times it occurs. This is useful in some situations, and I'm sure it's quicker to do in Perl than in Java. It's hardly a case-insensitive word count.

    Is this exactly the code you submitted to solve their problem, or did you retype this from memory?



    Christopher E. Stith
      I copy/pasted this code. I didn't re-type it. Why do you ask?
Re: Perl Destroys Interview Question
by Anonymous Monk on Jan 12, 2004 at 23:00 UTC

    While you were at it you should have used strict, or else a one liner would have served the purpose just the same.

    perl -lane '$w{lc $_}++ for @F;END{print for sort keys %w}' text.txt
      I am usually a strict zealot, but in this case, I felt it would take a little away from the conciseness of the solution. So I consciously left it out.

      A one-liner certainly would have served the purpose, but it wouldn't have been as readable. This was an interview question, not a Perl Golf competition. :-)

Re: Perl Destroys Interview Question
by Zaxo (Archbishop) on Jan 12, 2004 at 23:23 UTC

    What mr_mischief says, which can be fixed by replacement with $words{$_}++ for split; (no chomp needed). Also, I'd prefer an output loop which didn't construct a potentially long list of keys to iterate. Something like this,

    while ($_ = each %words) { print $_, ';', $words{$_}, $/; }

    I like to name my hashes singular for their values, not their keys. That makes the doc-suggested pronounciation work - $count('foo'} is "count of foo" and so on.

    After Compline,
    Zaxo

      I agree, I like the name %count better than %words. The hash (Map) in my Java solution was named wordCount.
Re: Perl Destroys Interview Question
by Abigail-II (Bishop) on Jan 13, 2004 at 02:26 UTC
    Of course, your Perl solution (which is incorrect as it counts lines, not words) take more than 5 times the lines a shell solution would take:
    cat words.dat | tr 'A-Z ' 'a-z\012' | sort | uniq -c

    I'd like to point out that for some problems, other solutions are more suited than Perl.

    Abigail

      Amen to that! The more languages I learn, the more I can see the strengths and weaknesses of each language.
      Your solution also breaks down if there is punctuation in the file. (OS HP-UX 11.0)

      File

      This is a test file. How many unique words are in this file? Do you know? Does the file contain more than ten words?

      Results

      1 1 a 1 are 1 contain 1 do 1 does 1 file 1 file. 1 file? 1 how 1 in 1 is 1 know? 1 many 1 more 1 ten 1 test 1 than 1 the 2 this 1 unique 1 words 1 words? 1 you

      Update: Changed the test file.

        That just depends on how a word is defined. Which the OP didn't. And considering the suggestions how to fix the OP's solution (split with no arguments/-a without a -F), I wasn't the only one taking the not uncommon "non-whitespace" definition.

        But I'd like to see the version you would write during a job interview. Make sure you take into account punctuation, Unicode and words like O'Reilly, and home-brew.

        Abigail

        Here are the requirements I was given...

        Program Purpose

        The goal of the program is to count the occurrences of all words in a file, and write this count into a new file.

        Requirements

        • The input file will contain 1 word per line (lines will be terminated by the newline character), and the file will contain an arbitrary number or lines.
        • The file will be terminated by an end of file character.
        • The word count must be case insensitive, as there may be varying case throughout the file.
        • The output file must write each word once, and include the number of occurrences of that word on the same line.
        • The lines in the output file must be sorted in ascending order.
        Sample Input:
        Chicago
        Paris
        chicago
        London
        red
        blue
        Green
        Red
        REd
        london
        
        Sample output:
        blue;1
        Chicago;2
        Green;1
        London;2
        Paris;1
        red;3
        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://320795]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-10-21 03:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (95 votes), past polls