http://www.perlmonks.org?node_id=212177

AltBlue has asked for the wisdom of the Perl Monks concerning the following question:

In a friendly discussion, we reached to arguing which language has the longest words so I just baked a gadget that grabs a web page from some URL, strips off HTML code, switches to the appropriate locale and then passes the preparsed text to a routine that returns a list with the longest words.
Simple. Heh, we settled that Germans use by far more lengthy words than others :)
Yet, my perl issue is that I really wonder what would be a shorter solution for my longest_words routine. Here is the code, in a complete test case snippet:
#!/usr/bin/perl -Twl use strict; use List::Util qw(max); my @lw = longest_words(join '', <DATA>); local $" = ', '; print length($lw[0]), ' letters: ', "@lw"; sub longest_words { local $_ = $_[0]; my %W; $W{$1} = length($1) while /\b(\w+)\b/sg; my $ll = max values %W; return grep { $W{$_} == $ll } keys %W; } __DATA__ three four five six seven eight nine foobar obsolescent superfluous inevitable pseudohash pseudonym pseudopod

--
AltBlue.

Replies are listed 'Best First'.
Re: The longest word ....
by grinder (Bishop) on Nov 12, 2002 at 09:12 UTC

    Just a word on all the solutions that use sort as the means of getting the longest word:

    Don't

    This is incredibly wasteful. A byproduct of calling sort is that you also solve the problem of determining the ordering of all the elements amongst themselves. All that information in unnecessary to solve the problem at hand, and what is more, the performance certainly isn't O(n).

    You can find the longest word in a single pass of a list (and BrowserUk, if you can only do this in two passes, or with a hash, well, I'm glad you don't work for me :). You can't sort an arbitrary list in a single pass, (or if you can, I'd got a few people I'd like you to meet). This means that the sort-based solutions are not going to scale as well as a single-pass approach.

    <update>This is the code I was thinking of (simplifying the problem of where the words come from for the sake of the argument):

    my $max = 0; my @longest; for $word( @words ) { my $length = length $word; if( $max < $length ) { $max = $length; @longest = (); push @longest, $word; } elsif( $max == $length ) { push @longest, $word; } }

    At the end of this single pass through the words, you will have a counter holding the length of the longest word(s), and an array that contains the word(s). No hashes needed.

    </update>

    And this is exactly the kind of program that people are going to start throwing dictionaries at. What is more, sort more or less insists that your set fits in memory, whereas the my approach needs only about as much space as the largest word.


    print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'

      Caveats noted, the problem wasn't "the longest word" but the longest words, which is either

      • Two passes.

        1 to get the length, one extract the words of that length.

      • Requires building a hash (or similar) to record the lengths and the words.

      The former would work on a list greater than memory, the latter is likely to fail before the sort.

      Anyone who "throws a wordlist" at an algorithm without understanding its limitations, gets what they deserve.

      In the context of the requested solution: A shorter way of finding the longest words in a supplied string (already memory bound) using sort was just a quick option. I didn't read the question as requiring a failsafe nor scalable solution. Did you?

      After all

      print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'

      is probably not the most efficient way of printing "just another bofh" either.


      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

        Another point of view without using sort:

        sub lword{ push(@{$a[length]},$_)for(pop)=~/\b\w+/g;@{$a[-1]} }

        Or clean the array every iteration too:
        sub lword2{ for((pop)=~/\b\w+/g){push(@{$a[length]},$_);@b[0..-2]=''}@{$a[-1]} }

        Update:(Changed "while" with "for" after testing)

        $anarion=\$anarion;

        s==q^QBY_^=,$_^=$[x7,print

      Just a word on all the solutions that use sort as the means of getting the longest word
      even more abstract: don't use sort on a data structure if you need just a single value. isn't it obvious ?! :))
       
      heh, the rest is history, you just didn't want to understand what i meant... I thought it was pretty obvious as long as I clearly spoken in my entire message of wordS ... with the exception of the title :P~
      thx again to BrowserUk for clearing things up for you with his reply. :)

      --
      AltBlue.

      Of course sort is wasteful of processing cycles, but what are we trying to conserve? These are web pages, it's 2002, we have RAM and MHz to spare. The poster is just satisfying his curiosity about finding long words ( and doing it in minimum keystrokes ), it's not meant to be an enterprise application.

      I agree that klugey code is bad, and we should all be aware of scaling / performance issues, but sometimes I think we can be needlessly Puritan about it.

      He asked for the shortest solution, grinder, not the best.

      In my first post in this thread I cautioned that it could be done in one pass.

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: The longest word ....
by sauoq (Abbot) on Nov 12, 2002 at 03:49 UTC

    A little shorter maybe and doesn't require any modules...

    sub longest_words { my @W = $_[0] =~ /\b(\w+)\b/g; my $L = 0; $L = length() < $L ? $L : length for @W; grep length == $L, @W; }

    I wouldn't actually do it this way though. I'd pay more attention to doing it in a single pass through the list of words.

    Update: Switched back to the regex for extracting the words.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: The longest word ....
by FamousLongAgo (Friar) on Nov 12, 2002 at 03:33 UTC

    print join "\n", sort { length($b) > length($a) } split /\s+/, <DATA>;

    But can anyone help me find a clever way to only print the words of maximum length?

      Of course using sort just makes it too easy. :-)

      sub longest_words { my @W = sort { length($b) <=> length($a) } $_[0] =~ /\b(\w+)\b/g; grep length == length($W[0]), @W; }

      As long as you do it right, that is. You'll need to use <=> in your sort. Just testing for 'greater than' isn't sufficient.

      Update: A little golfing gets the sub down to 79 characters:

      sub longest_words { my@W=sort{length($b)<=>length($a)}$_[0]=~/(\w+)/g;grep length==length( +$W[0]),@W }

      -sauoq
      "My two cents aren't worth a dime.";
      

        # 1 2 3 4 5 6 7 #23456789_123456789_123456789_123456789_123456789_123456789_123456789_ +123456789 my@W=sort{length($b)<=>length($a)}$_[0]=~/(\w+)/g;grep length==length( +$W[0]),@W

        Untested code ahead

        # 1 2 3 4 5 6 7 #23456789_123456789_123456789_123456789_123456789_123456789_123456789_ +1 my@W=sort length$b<=>length$a,$_[0]=~/\w+/g;grep length==length$W[0],@ +W
        See, no parens :)

        But, since we're playing golf...

        # 1 2 3 4 #23456789_123456789_123456789_123456789_123 push@{$_[y///c]},$_ for+pop=~/\w+/g;@{+pop}

        - Yes, I reinvent wheels.
        - Spam: Visit eurotraQ.
        

        heh, stripping blanks off does no good for a real golf ;-) but the idea of skipping the additional hash ofc it' good :) wd. hehe, some golfing you could have done seeing that 'length' appears quite often in your routine ;-)

        --
        AltBlue.

Re: The longest word ....
by dingus (Friar) on Nov 12, 2002 at 07:24 UTC
    Yet, my perl issue is that I really wonder what would be a shorter solution for my longest_words routine.

    I think you want to do the hash the other way around. I.e. make it a hash of lengths and push each word onto an anonymous array of words of than length:

    sub longest_words { my %W; push (@{$W{length($_)}}, $_) for (split /\s+/, join ' ', @_); return @{$W{ (sort{$b <=>$a} keys %W)[0]} }; }
    Notes: If you do a print Dumper (\%W); before the return you will see the hash that has been constructed.
    The (sort{$b <=>$a} keys %W)[0] returns the key with the highest numerical value.

    Dingus


    Enter any 47-digit prime number to continue.
Re: The longest word ....
by BrowserUk (Patriarch) on Nov 12, 2002 at 05:37 UTC

    sub lword{grep{!$£?$£=length:$£==length}sort{length$b<=>lengt +h$a}pop=~/\b(\w+)\b/sg}

    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for cyclone or a hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

      nice :))
      ah, it's good that I caught you here with this thingie that puzzles me for some time: I've seen this ${hi-ascii-char} trick before, but I forgot to ask then, what's that? a pure abuse or is it documented somewhere?
      tia.

      --
      AltBlue.

Re: The longest word ....
by kabel (Chaplain) on Nov 12, 2002 at 06:23 UTC
    a short note:
    local $" = ', ';
    the local is meaningless here because you are at file scope and strict.pm does not complain about its use.
Re: The longest word ....
by pg (Canon) on Nov 12, 2002 at 04:07 UTC
    Here is one way, and the real part only takes one line.
    use strict; my @words = ("one", "two", "three", "four", "sumofallnumbers", "five", + "six", "seven", "eight"); my @sorted_words = sort {length($b) <=> length($a)} @words; print "the longest word is: $sorted_words[0]";
      There could be more than one longest words having the same length, you probably want all of them:
      use strict; my @words = ("one", "two", "123456789012345", "three", "four", "sumofa +llnumbers", "five", "six", "seven", "eight"); my @sorted_words = sort {length($b) <=> length($a)} @words;

      my @longest_words = grep (/.{$#sorted_words}/, @words);
      print join(",", @longest_words);
Re: The longest word ....
by Courage (Parson) on Nov 12, 2002 at 09:34 UTC
    It's not surprising for me to hear that Germans use more lenghty words, just because their grammatic rules often arrange different words to be written together in one word.
    That said, it is not very correct result: they do not have longest word in a world, they just write down a lot of words without any spaces between them.

    Courage, the Cowardly Dog

Complete 'clean' code (Re: The longest word)
by AltBlue (Chaplain) on Nov 13, 2002 at 17:07 UTC
    Finally, let's end this initially wanted recreational thread with a more serious like post... here is the complete code for this 'longest words' thingie.
    Please note that this is a revisited version, using nice and clean (a.k.a. 'serious') coding style :))