Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Hangman Assistant

by blokhead (Monsignor)
on Jul 12, 2009 at 03:47 UTC ( #779302=note: print w/ replies, xml ) Need Help??


in reply to Hangman Assistant

No comments on your code, but wouldn't the optimal strategy be to pick not the most common letter, but the letter that is closest to appearing in exactly half of the possible remaining words? That way you eliminate ~1/2 of the candidates in each turn.

At least in your example, a letter never appears in more than half of the candidates, so the most frequent letter and the closest-to-half-half letter coincide. But imagine if you found out that the actual word contained Q. Then for sure the next most common letter will be a U; but guessing U and getting it right will not give you much new information. Aiming for half-half lets your correct and incorrect guesses both contribute information.

blokhead


Comment on Re: Hangman Assistant
Re^2: Hangman Assistant
by Lawliet (Curate) on Jul 12, 2009 at 05:21 UTC

    You make a good point, and I made the following change to the my program:

    # Find how close each letter is to half of the total word possibilitie +s to ensure maximum gain every guess after being sorted foreach my $occur (keys %alphabet) { $alphabet{$occur} = abs($#narrowed/2 - abs($alphabet{$occur} - $#n +arrowed + 1)); } say $_ foreach @narrowed; # Word list say $#narrowed + 1; say sort { $alphabet{$a} <=> $alphabet{$b} } keys %alphabet; # Sort as +cendingly; whichever letter is closest to 0, i.e., and therefore whic +hever letter will eliminate the most words.

    However, as I play, I notice that although it does eliminate a lot of words very quickly, when it gets to a low amount of words, it becomes useless, telling me to guess letters that are not in any of the words, and telling me to guess letters that are in all of the words last.

    Surely, when it comes to this point, the user can easily guess on his own but that is not really the point. I want the program to be able to find the individual word in a small amount of guesses. Perhaps I should use your method when there are more than, say, 10 possibilities, and mine from there on out.

    Example illustrating my point:

    I am kind of speaking to myself here, so this node is just publishing my own mental thoughts. Feel free to comment or object them.

    I don't mind occasionally having to reinvent a wheel; I don't even mind using someone's reinvented wheel occasionally. But it helps a lot if it is symmetric, contains no fewer than ten sides, and has the axle centered. I do tire of trapezoidal wheels with offset axles. --Joseph Newcomer

      it becomes useless, telling me to guess letters that are not in any of the words

      That would indicate a bug in the implementation, not a problem in the approach as you claim.

Re^2: Hangman Assistant
by Limbic~Region (Chancellor) on Jul 13, 2009 at 00:41 UTC
    blokhead,
    I disagree with your strategy based on my limited knowledge of the game hangman. It is my understanding that the game continues until you have either revealed the word or made too many incorrect guesses. I think the best strategy then would be to guess the letter that appears in the most of the candidate words. I too haven't looked at the OP's code but my strategy doesn't necessarily pick the most popular letter but the one that appears in the most words. If you are correct you get new information (position of that letter) and are no closer to losing the game. If you are wrong, you eliminate the most possible wrong answers. I will code this strategy up to see how it does in a follow on post.

    Cheers - L~R

      Those were my thoughts too. However, if the user picked an extremely common word, with common letters (which makes it a common word), then shaving off 50% each time is more helpful than eliminating the 10% that don't have the recommended, most common letter.

      Of course this is all theoretical and should be tested before changes are made (which I failed to do initially).

      I don't mind occasionally having to reinvent a wheel; I don't even mind using someone's reinvented wheel occasionally. But it helps a lot if it is symmetric, contains no fewer than ten sides, and has the axle centered. I do tire of trapezoidal wheels with offset axles. --Joseph Newcomer

        Lawliet,
        then shaving off 50% each time is more helpful than eliminating the 10% that don't have the recommended, most common letter

        That's not the way it works. At least not how I understand the game. So let's say you have a really common word with common letters and you pick the most common letter amongst the possible candidates. If you pick a letter that is correct, you gain new information (the position of that letter) and you don't get penalized for a wrong guess. As long as you keep guessing correct letters you can go on forever. Additionally, you still prune your candidate list because even though lots of words have the same letter - they don't all have them in the same position. If the letter is wrong, well then you purge the majority of the words in your candidate list.

        You haven't convinced me approach isn't superior to the binary search of blokhead's. It can easily be tested - just modify the code for each algorithm to output "win <total_guesses> <wrong_guesses>" or "lose" rather than the verbose output. Then write a wrapper script that tests both algorithms against 10_000 randomly selected words. You can then gather statistics on which algorithm produced the most wins and for the wins, the average number of guesses required and the average number of wrong guesses used. Oh, and they should both "lose" after the same number of wrong guesses - my code defaults to 7 but is configurable.

        Cheers - L~R

Re^2: Hangman Assistant
by JavaFan (Canon) on Jul 13, 2009 at 14:42 UTC
    The goal of hangman isn't to minimize the number of guesses (if it was, your approach would make sense), but to minimize the number of wrong guesses. Or, to be more specific, have no more than a set number of incorrect guesses.

    That actually means that the hardest hangman games are where you have to guess short words. Given that /usr/share/dict/words has 23 three letter words ending in at (only "aat", "iat" and "uat" are missing) only luck determines whether you "win" guessing a word like "mat", "cat" or "hat".

Re^2: Hangman Assistant
by Limbic~Region (Chancellor) on Jul 13, 2009 at 18:05 UTC
    blokhead,
    Some more thoughts:

    Your strategy is effectively a binary search. Assuming a perfect distribution (it is always possible to find a letter that is in exactly half of the remaining words), you should be able to guess (on average) up to twice as many times as you are allowed wrong guesses before losing the game. Of course, the last guess still has a 50% chance of being wrong so I believe your strategy is guaranteed to work when the total number of initial candidates is 2^(2G - 1) or less, where G represents the number of allowed wrong guesses. For instance, when G = 7 then you are guaranteed (100%) to win when the initial number of candidates is 8192 or less and 50% when it is up to 16,384. For eclectic, there are 9638 initial words in my word list so only a 50% chance of success.

    My strategy is optimized not to guess wrong. After only 5 guesses (2 right and 3 wrong) it had narrowed the search space down to exactly 1 word. This is because I prune by position not just by presence of letter so even successful guesses on a popular letter can still effectively decrease the search space. Your approach would be improved with this strategy as well. I don't think the opportunities stop there.

    In a private /msg with Lawliet the idea of finding the solution with the least number of guesses was proposed. I indicated that my strategy would change - but to what? A binary search is already optimal. There needs to be some balance then between improving your odds from 50/50 of guessing wrong while still effectively reducing your search space each time. This way you can survive long enough to win but not guess ad nauseum.

    I am kicking around some ideas where you still look for a very popular letter (say in 70% of the remaining words) but by position would still split that 70% in half if you guessed right. The result then would be that you would guess wrong 30% of the time and remove 70% of your search space or guess right 70% of the time but reduce your search space by 65%. Does this sound viable to you?

    Cheers - L~R

      That makes sense. I hadn't considered also taking into account the positions of the letters when you guess a correct letter. So in my example, when the word has a Q, and you guess U, you can still potentially get some useful information if many of the candidate words have U appearing in different places (i.e., you could distinguish QUEUE from QUEST). In this case, it would be "best" (if minimizing total # of guesses) to try to choose a letter whose absence / presence at all positions will partition the candidates into a large number of sets, each with size as small as possible.

      Update: expanding with an example: Suppose the word to be guessed matches S T _ _ _. Then suppose we are considering E for our next guess. All of the candidate words will then fall into one of these 8 classifications:

      S T _ _ _ (no E in the word) S T _ _ E (E in last position ONLY) S T _ E _ (etc..) S T _ E E S T E _ _ S T E _ E S T E E _ S T E E E
      So we have 8 buckets, and we put all of the candidate words into the appropriate bucket. Suppose the bucket with most words has n words in it. Then in the worst case, after guessing E, we will have n remaining candidates. So you can take n to be the worst-case score of guessing E. Now compute this score for every letter, and take the letter with lowest score.

      Note that there might be other ways to score each possible next-letter guess. Number of non-empty buckets comes to mind as an "average case" measure (to be maximized). Again, this is all assuming we're minimizing the total number of guesses. That way, all of the possible outcomes are (i.e., the guessed letter appears or doesn't appear in the word) are treated the same. To minimize the number of wrong guesses, you have to treat the "doesn't appear in the word" outcome differently and weight things in some better way.

      blokhead

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://779302]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2014-07-30 11:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (230 votes), past polls