Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Perl Destroys Interview Question

by redsquirrel (Hermit)
on Jan 12, 2004 at 22:36 UTC ( #320795=perlmeditation: print w/ replies, xml ) Need Help??

I'm in the process of interviewing with a small, Java-centric tech company. They recently asked me to solve a programming problem in Java. The problem involved reading input, calculating a case-insensitive word count, and then outputting the details of the word count. My Perl solution looked like this:
#!/usr/bin/perl %words = (); while ( <STDIN> ) { chomp; $words{ lc( $_ ) } ++; } foreach my $word ( sort keys %words ) { print "$word;$words{ $word }\n"; }
My Java equivalent had 5x the lines and 10x the characters. I submitted both solutions, pointing out that the problem was much more suited for Perl than Java. I have my second interview next Monday. :-)

Comment on Perl Destroys Interview Question
Download Code
Re: Perl Destroys Interview Question
by LAI (Hermit) on Jan 12, 2004 at 22:53 UTC

    Well done, redsquirrel. This seems to point out what most Java-Perl holy wars miss: that for certain applications Perl is far more useful than Java. (and, by extension, vice-versa.)

    LAI

    __END__
Re: Perl Destroys Interview Question
by mr_mischief (Prior) on Jan 12, 2004 at 22:56 UTC
    This doesn't count words in a file. It almost counts unique lines in a file. What it actually does is list each unique line in a file and the number of times it occurs. This is useful in some situations, and I'm sure it's quicker to do in Perl than in Java. It's hardly a case-insensitive word count.

    Is this exactly the code you submitted to solve their problem, or did you retype this from memory?



    Christopher E. Stith
      I copy/pasted this code. I didn't re-type it. Why do you ask?
Re: Perl Destroys Interview Question
by Anonymous Monk on Jan 12, 2004 at 23:00 UTC

    While you were at it you should have used strict, or else a one liner would have served the purpose just the same.

    perl -lane '$w{lc $_}++ for @F;END{print for sort keys %w}' text.txt
      I am usually a strict zealot, but in this case, I felt it would take a little away from the conciseness of the solution. So I consciously left it out.

      A one-liner certainly would have served the purpose, but it wouldn't have been as readable. This was an interview question, not a Perl Golf competition. :-)

Re: Perl Destroys Interview Question
by Zaxo (Archbishop) on Jan 12, 2004 at 23:23 UTC

    What mr_mischief says, which can be fixed by replacement with $words{$_}++ for split; (no chomp needed). Also, I'd prefer an output loop which didn't construct a potentially long list of keys to iterate. Something like this,

    while ($_ = each %words) { print $_, ';', $words{$_}, $/; }

    I like to name my hashes singular for their values, not their keys. That makes the doc-suggested pronounciation work - $count('foo'} is "count of foo" and so on.

    After Compline,
    Zaxo

      I agree, I like the name %count better than %words. The hash (Map) in my Java solution was named wordCount.
Re: Perl Destroys Interview Question
by Abigail-II (Bishop) on Jan 13, 2004 at 02:26 UTC
    Of course, your Perl solution (which is incorrect as it counts lines, not words) take more than 5 times the lines a shell solution would take:
    cat words.dat | tr 'A-Z ' 'a-z\012' | sort | uniq -c

    I'd like to point out that for some problems, other solutions are more suited than Perl.

    Abigail

      Amen to that! The more languages I learn, the more I can see the strengths and weaknesses of each language.
      Your solution also breaks down if there is punctuation in the file. (OS HP-UX 11.0)

      File

      This is a test file. How many unique words are in this file? Do you know? Does the file contain more than ten words?

      Results

      1 1 a 1 are 1 contain 1 do 1 does 1 file 1 file. 1 file? 1 how 1 in 1 is 1 know? 1 many 1 more 1 ten 1 test 1 than 1 the 2 this 1 unique 1 words 1 words? 1 you

      Update: Changed the test file.

        That just depends on how a word is defined. Which the OP didn't. And considering the suggestions how to fix the OP's solution (split with no arguments/-a without a -F), I wasn't the only one taking the not uncommon "non-whitespace" definition.

        But I'd like to see the version you would write during a job interview. Make sure you take into account punctuation, Unicode and words like O'Reilly, and home-brew.

        Abigail

        Here are the requirements I was given...

        Program Purpose

        The goal of the program is to count the occurrences of all words in a file, and write this count into a new file.

        Requirements

        • The input file will contain 1 word per line (lines will be terminated by the newline character), and the file will contain an arbitrary number or lines.
        • The file will be terminated by an end of file character.
        • The word count must be case insensitive, as there may be varying case throughout the file.
        • The output file must write each word once, and include the number of occurrences of that word on the same line.
        • The lines in the output file must be sorted in ascending order.
        Sample Input:
        Chicago
        Paris
        chicago
        London
        red
        blue
        Green
        Red
        REd
        london
        
        Sample output:
        blue;1
        Chicago;2
        Green;1
        London;2
        Paris;1
        red;3
        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://320795]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2014-07-28 12:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (197 votes), past polls