http://www.perlmonks.org?node_id=424474

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I've asked about this before but I'll be the first to admit I'm stuck on the recursion/logic of it.

By the way, there is, as far as I know, no openly-source or free anagram engine anywhere. There are some good websites (http://www.wordsmith.org/anagram/) , but no offline Perl solution that I can find.

So, assuming that I have already created a dictionary-hash of "ur-words" where

$dictionary{'eors'} = 'eros,ores,roes,rose,sore'
then, what's my next move? Because for single words (in the answer), it's a no-brainer. Sort letters, lookup either fails or it doesn't, here are your words.

But for multiple words in the answer? (that website gives nearly 1300 possible anagrams for "perl wisdom") I need to use a permutation algorithm on my input string, then index through each permutation, looking for a string which matches a word in my dictionary ... then look at the left-over letters and perform the same process, then the left-overs of that process..? This is where it all gets a bit misty for me.

Any help gratefully received. Which permutation module/algorithm should I use for a start? For long strings the amount of permutations is huge.



($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: Solving Anagrams
by trammell (Priest) on Jan 24, 2005 at 03:22 UTC
    By the way, there is, as far as I know, no openly-source or free anagram engine anywhere.

    But there is!

Re: Solving Anagrams
by jweed (Chaplain) on Jan 24, 2005 at 06:15 UTC
Re: Solving Anagrams
by Zaxo (Archbishop) on Jan 24, 2005 at 06:17 UTC

    Your dictionary will be easier to handle if the values are array references. Either do, $_ = [split /,/] for values %dictionary; or push them on in the first place.

    Multi-word matching can be done by picking a key that can occur, eliminating its used characters from the sorted string and looking again. Backtrack if you have a residue which has no anagram.

    After Compline,
    Zaxo

Re: Solving Anagrams
by ambrus (Abbot) on Jan 24, 2005 at 09:14 UTC

    The gson entry in the 1992 IOCCC winners may help, although it has the limitation that the alphabet must not be larger than 32 letters.

Re: Solving Anagrams
by Anonymous Monk on Jan 24, 2005 at 09:42 UTC
    By the way, there is, as far as I know, no openly-source or free anagram engine anywhere.
    Look harder. There are lots of open source anagram programs around. With all kinds of features. Isn't it one of the standard "BSD games"?
Re: Solving Anagrams
by jdporter (Paladin) on Jan 24, 2005 at 18:29 UTC
    First thing you do is delete from your %dictionary all ur-words that contain any letters not present in your source phrase. Another preprocess you can do is eliminate all ur-words that are already too long.
    my $source_phrase = 'perl wisdom'; my $source_urword = join '', grep /\w/, sort_uniq( split //, $source_p +hrase ); # or whatever for ( keys %dictionary ) { delete $dictionary{$_} if /[^$source_urword]/ || length($_) > length($source_urword); }
    That should significantly enhance the performance of any subsequent algorithm... unless you're unlucky enough to have a source phrase containing nearly all the available alphabet.
Re: Solving Anagrams
by Cody Pendant (Prior) on Jan 24, 2005 at 20:11 UTC
    Thanks everyone for your help. I think "look harder" was the best advice! I was going by that website, which I suppose was saying there's no free Windows solution.

    I like the look of Generate Multi-Word Anagrams which it seems was inspired by my original whinge on this subject.

    If anybody has any tips on how to install the C anagram generator on Mac OS X, I'd love to hear them.



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
    =~y~b-v~a-z~s; print
      If anybody has any tips on how to install the C anagram generator on Mac OS X, I'd love to hear them.

      Yeah - treat it like Unix. You do know that OS-X is BSD with Aqua as the windowing manager, right?

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Solving Anagrams
by Jaap (Curate) on Jan 24, 2005 at 17:10 UTC
    Why don't you:
    1. Create all the permutations possible
    2. Check each of them against the dictionary
    So for a word like "car", check whether qw(acr arc car cra rac rca) exists in the dictionary
      That's a pretty dumb idea because the number of permutations grows even worse than exponentially. For instance, my /usr/share/dict/words contains 45427 entries, which is only slightly more than 8!, the number of permutation you get with 8 different letters.

      If you don't want to use one of the many canned solutions that are out there (and why wouldn't you?), a reasonably efficient solution goes as follows:

      1. Create a histogram of your search word (so "car" becomes "1 a, 0 b, 1 c, 0 d, ...., 0 q, 1 r, 0 s, ..., 0 z").
      2. For all words in the dictionary, create a histogram of said word. If the histograms are identical, the words are anagrams of each other. (If you want to find words that can be made from a subset of the letters, look for anagrams which have values equal or less than your search word. For solutions with spaces (groups of words), recurse.)
      But it's much better to use an existing solution written in C (C is much faster than Perl for this kind of work), and just qx it.