Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

challenging the dictionary

by Discipulus (Curate)
on Apr 27, 2004 at 11:12 UTC ( #348444=perlquestion: print w/ replies, xml ) Need Help??
Discipulus has asked for the wisdom of the Perl Monks concerning the following question:

Hi wise monks !

I had a idea but I need some suggestion to start it correctly.
The question is: given a dictionary file I want to find out the minimal number of words that use all the letters of a given alphabet

It sounds mad ?? please can u address me ??

cheers from sunny roma
Lor*

Comment on challenging the dictionary
Re: challanging the dictionary
by aquarium (Curate) on Apr 27, 2004 at 11:27 UTC
    sounds like boring homework actually. look for longest words in dictionary with greatest number of letter diversity to complement each other. you can find these handful of words with a little script. then just play with the words by hand, or optimize it with a script. A friend of mine once (a long time ago) challenged me to a program vs program game of noughts and crosses. he programmed logic, i hardcoded the moves. i won every game.
Re: challanging the dictionary
by blue_cowdawg (Monsignor) on Apr 27, 2004 at 11:51 UTC

        had a idea but I need some suggestion to start it correctly.

    Post what you have then. The monks here will gladly help you with the problem, but you need to show some effort on your part.

Re: challanging the dictionary
by halley (Prior) on Apr 27, 2004 at 14:12 UTC
    Such a sentence is called a pangram.

    Find an anagram solver, and give it a source "word" of 'abcdefghijklmnopqrstuvwxyz'. If you can't find an anagram for that, add another 'e'. Or an 's'. Or a 't'. Or an 'a'. You could bruteforce this prospect, but you should figure out a pangram pretty quickly.

    --
    [ e d @ h a l l e y . c c ]

Re: challanging the dictionary
by blokhead (Monsignor) on Apr 27, 2004 at 16:05 UTC
    Good luck, this is exactly the NP-complete Minimum Set Cover problem (which makes me hope it's not homework). The universe is the set of letters, the family of subsets is the dictionary of words. You want to minimize the number of words needed to use all the letters in the alphabet, in other words, the number of subsets needed such that each element in the universe is contained in at least one chosen subset.

    You won't be able to do significantly better than brute force, unless an approximation algorithm would also be appropriate for your needs. But with a huge dictionary, the running time is going to be intractable.

    blokhead

      blokhead,
      This seems to be a relatively fast approximation algorithm (64K words in about 5 seconds):

      The algorithm is quite simple. Start with the rarest letter and look for words containing that letter that have the most unique letters not found so far. Wash-Rinse-Repeat.

      Cheers - L~R

      blokhead,
      Good luck,...

      Thanks, I am sure I will need a bit of that.

      But with a huge dictionary, the running time is going to be intractable.

      Well fortunately for humans, alphabets are relatively small and very long words that do not repeat letters are uncommon. The number of words that you need consider from a huge (~ 65K words) is quite manageable. See How many words does it take? for an example.

      Cheers - L~R

Re: challenging the dictionary
by Limbic~Region (Chancellor) on Oct 25, 2006 at 14:29 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://348444]
Approved by mce
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-09-16 03:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (155 votes), past polls