|Problems? Is your data what you think it is?|
Challenge: Design A Better Hangman Algorithmby Limbic~Region (Chancellor)
|on Jul 15, 2009 at 17:59 UTC||Need Help??|
Limbic~Region has asked for the
wisdom of the Perl Monks concerning the following question:
In Hangman Assistant, Lawliet posted an interesting problem. The unstated objective is to maximize the number of games won while minimizing the number of incorrect guesses. To summarize my participation in that thread, I found that with a comprehensive pruning algorithm, you could achieve great results (better than binary search) by choosing the letter that had the highest probability of being in the mystery word. Out of approximately 60K words, this approach won 96.28% of the time and only averaged 1.74 wrong guesses when winning. A nominal improvement can be made by breaking ties by looking at which letter has the highest overall density amongst candidate words (96.31% and 1.73 average wrong guesses).
I also postulated that it should be possible to do even better by taking into consideration other factors. I started to do this by examining just the words that the algorithm failed to win against. I am finding little time to work on this but since this may appeal to more people, I am presenting it as a challenge. Rather than starting from scratch, here is the data that you can assume has already been calculated for you to render your decision for "best letter to choose".
From this, you can easily duplicate my algorithm by determining which letter to choose based on which word appears in the most candidate words (highest probability of being right) and break ties by using the highest overall density. You should however be able to do so much more. There are a couple of main approaches that I can think of. First, a 1 size fits all algorithm that weighs various different pieces of data to produce an overall score. The advantage here is that you should be able to tune these weights without touching code to produce the best results for a given "dictionary". The second approach would be to have multiple algorithms already fine tuned for a given set of conditions and dispatch to that algorithm based on initial conditions. For instance, it may be ok to take a risk of guessing wrong if the remaining wrong guesses is still high or perhaps utilize the "simple" algorithm already presented unless the probabilities and candidates remaining (if right or wrong) is beyond some threshold.
Even though both of those approaches leave implementations wide open, don't assume those are the only two ideas to be had. Feel free to come up with whatever you think is best. Heck, you may even want to have a pre-process fine tuning algorithm based on the dictionary that allows weights and/or algorithms to be chosen by the data.
Cheers - L~R