http://www.perlmonks.org?node_id=320843


in reply to Perl Destroys Interview Question

Of course, your Perl solution (which is incorrect as it counts lines, not words) take more than 5 times the lines a shell solution would take:
cat words.dat | tr 'A-Z ' 'a-z\012' | sort | uniq -c

I'd like to point out that for some problems, other solutions are more suited than Perl.

Abigail

Replies are listed 'Best First'.
Re: Re: Perl Destroys Interview Question
by redsquirrel (Hermit) on Jan 13, 2004 at 15:17 UTC
    Amen to that! The more languages I learn, the more I can see the strengths and weaknesses of each language.
Re: Re: Perl Destroys Interview Question
by Rhose (Priest) on Jan 13, 2004 at 16:48 UTC
    Your solution also breaks down if there is punctuation in the file. (OS HP-UX 11.0)

    File

    This is a test file. How many unique words are in this file? Do you know? Does the file contain more than ten words?

    Results

    1 1 a 1 are 1 contain 1 do 1 does 1 file 1 file. 1 file? 1 how 1 in 1 is 1 know? 1 many 1 more 1 ten 1 test 1 than 1 the 2 this 1 unique 1 words 1 words? 1 you

    Update: Changed the test file.

      Here are the requirements I was given...

      Program Purpose

      The goal of the program is to count the occurrences of all words in a file, and write this count into a new file.

      Requirements

      • The input file will contain 1 word per line (lines will be terminated by the newline character), and the file will contain an arbitrary number or lines.
      • The file will be terminated by an end of file character.
      • The word count must be case insensitive, as there may be varying case throughout the file.
      • The output file must write each word once, and include the number of occurrences of that word on the same line.
      • The lines in the output file must be sorted in ascending order.
      Sample Input:
      Chicago
      Paris
      chicago
      London
      red
      blue
      Green
      Red
      REd
      london
      
      Sample output:
      blue;1
      Chicago;2
      Green;1
      London;2
      Paris;1
      red;3
      
        So your original solution works for the narrow scope of the requirements. It fails if the requirement that there is one word per line is changed. This explains perfectly why the questions above arose about lines versus words -- according to the spec, they can be considered the same.

        Now only one question remains. Do you code to exactly match a questionable spec? Or, more to the point, wouldn't it be better to code something which works according to the exact spec plus gets the behavior right if the questionable part of the spec is changed?

        I think that when possible, a restrictively narrow spec should be answered with a more general solution which works for the spec at hand and future likely changes. In some instances, the future likely cases are hard to determine. In this one they are not. In the spirit of a job interview, I'd like to see either both ways implemented, or a comment in the code that one way was chosen over the other because of the nature of the spec.

        Of course, redsquirrel, since you already went above and beyond what the question asked it wouldn't be fair to complain that you didn't do even more work. I'm just making points about more general cases again. ;-)

        Come to think of it, it seems that much of my life as a programmer, and even much of my life besides programming (and probably because of habits learned from programming) is about making solutions which already work for one case more general. I think this is probably a goal of a large percentage of programming effort overall.

        Update: fixed a tpyo.



        Christopher E. Stith
      That just depends on how a word is defined. Which the OP didn't. And considering the suggestions how to fix the OP's solution (split with no arguments/-a without a -F), I wasn't the only one taking the not uncommon "non-whitespace" definition.

      But I'd like to see the version you would write during a job interview. Make sure you take into account punctuation, Unicode and words like O'Reilly, and home-brew.

      Abigail