Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Hangman Assistant

by Limbic~Region (Chancellor)
on Jul 14, 2009 at 04:00 UTC ( #779783=note: print w/ replies, xml ) Need Help??


in reply to Hangman Assistant

Lawliet,
I outlined a new approach here that may be worth pursuing if your goal is to minimize the total number of guesses. It may seem counter intuitive that there is a better approach than a binary search but I outlined an example where you have better than 50% chance of guessing correctly and a 100% chance of pruning more than 50% of the remaining candidates. I was starting to work on it when I realized I had missed an opportunity for pruning in my original. This now guesses 'eclectic' with only 2 incorrect guesses using a dictionary of 60,388 words (9,638 of them being the same length). Here is that modified code:

#!/usr/bin/perl use strict; use warnings; use Getopt::Std; my %opt; get_args(\%opt); my $dict = load_dictionary($opt{d}, $opt{w}); my %guessed; my $curr_guess = join '', map {'*'} 1 .. length($opt{w}); while (1) { last if $curr_guess eq $opt{w} || ! $opt{g}; print "Current value:\t$curr_guess\n"; my $best_letter = find_best_letter($dict, \%guessed); die "$opt{w} not in dictionary" if ! $best_letter; print "Best letter to choose: $best_letter\n"; my $result = index($opt{w}, $best_letter) == -1 ? 'wrong' : 'right +'; if (index($opt{w}, $best_letter) == -1) { print "You guessed wrong\n"; --$opt{g}; prune_dict_bad_guess($dict, $best_letter); } else { print "You guessed right\n"; $curr_guess = update_current($curr_guess, $opt{w}, $best_lette +r); prune_dict_correct_guess($dict, $curr_guess, $best_letter); } $guessed{$best_letter} = undef; } print "\n$curr_guess\n"; sub prune_dict_correct_guess { my ($dict, $curr, $let) = @_; my (%right, %wrong); for (0 .. length($curr) - 1) { my $chr = substr($curr, $_, 1); $chr eq '*' ? ($wrong{$_} = $let) : ($right{$_} = $chr); } for my $word (keys %$dict) { for my $pos (0 .. length($word) - 1) { if ($right{$pos} && substr($word, $pos, 1) ne $right{$pos} +) { delete $dict->{$word}; last; } if ($wrong{$pos} && substr($word, $pos, 1) eq $wrong{$pos} +) { delete $dict->{$word}; last; } } } } sub update_current { my ($src, $tgt, $let) = @_; for (0 .. length($tgt) - 1) { substr($src, $_, 1, $let) if substr($tgt, $_, 1) eq $let; } return $src; } sub prune_dict_bad_guess { my ($dict, $letter) = @_; for my $word (keys %$dict) { delete $dict->{$word} if index($word, $letter) != -1; } } sub find_best_letter { my ($dict, $guessed) = @_; my %alpha; for my $word (keys %$dict) { my %uniq = map {$_ => undef} split //, $word; $alpha{$_}++ for keys %uniq; } delete @alpha{keys %$guessed}; # Would be better as water mark algorithm my @best = sort {$alpha{$b} <=> $alpha{$a}} keys %alpha; return $best[0]; } sub get_args { my ($opt) = @_; my $Usage = qq{Usage: $0 [options] -h : This help message -d : The (d)ictionary file Default: 'words.txt' in the current working directory -g : The number of (g)uesses Default: 7 -w : The (w)ord to be guessed } . "\n"; getopts('hd:g:w:', $opt) or die $Usage; die $Usage if $opt->{h}; die $Usage if ! $opt->{w} || $opt->{w} =~ /[^a-zA-Z] +/; $opt->{d} = 'words.txt' if ! defined $opt->{d}; $opt->{g} = 7 if ! defined $opt->{g}; $opt->{w} = lc($opt->{w}); } sub load_dictionary { my ($file, $word) = @_; my $desired_length = length($word); my %dict; open(my $fh, '<', $file) or die "Unable to open '$file' for readin +g: $!"; while (<$fh>) { tr/a-zA-Z//cd; next if length($_) != $desired_length; $dict{lc($_)} = undef; } return \%dict; }

As a result of the code above, I didn't bother finishing the weighted solution that considers probability of guessing correct and percentage of words pruned (for right or wrong). If you are interested, I can give you my code up till then. Why did I lose interest?

length total won lost percent_won ave_wrong_guess_from_ +wins 4 2360 1495 865 63.35 4.07 5 4479 3732 747 83.32 3.67 6 6954 6550 404 94.19 3.07 7 9222 9031 191 97.93 2.45 8 9639 9623 16 99.83 1.74 9 8687 8687 0 100.00 1.19 10 6999 6999 0 100.00 0.83 11 4884 4884 0 100.00 0.58 12 3135 3135 0 100.00 0.44 13 1800 1800 0 100.00 0.30 14 861 861 0 100.00 0.19 15 413 413 0 100.00 0.16 16 165 165 0 100.00 0.10 17 83 83 0 100.00 0.11 18 25 25 0 100.00 0.12 19 9 9 0 100.00 0.00 20 6 6 0 100.00 0.17 21 1 1 0 100.00 0.00 tot 59722 57499 2223 96.28 1.74

Cheers - L~R


Comment on Re: Hangman Assistant
Select or Download Code
Re^2: Hangman Assistant
by Lawliet (Curate) on Jul 14, 2009 at 05:02 UTC

    Oohh, definitely interesting results! It seems you are still using the 'find most common letter' method of finding the next best letter though. Is the only thing you changed the way you prune the wordlist?

    A few tweaks here and there, the insertion of perl capabilities into a contact lens, and I think we may have a viable cheating method at our disposal!

    I don't mind occasionally having to reinvent a wheel; I don't even mind using someone's reinvented wheel occasionally. But it helps a lot if it is symmetric, contains no fewer than ten sides, and has the axle centered. I do tire of trapezoidal wheels with offset axles. --Joseph Newcomer

      Lawliet,
      Let me back into the pruning opportunity by explaining where I was going with the other (now abandoned) method. In my discussion with blokhead, I explained that is should both be possible to guarantee a better than 50/50 chance of guessing correct as well as pruning the remaining candidates by more than 50%. I was setting out to do just that.

      Now let's assume we were going with 'eclectic'. There were 9,638 words in my dictionary with a length of 8. The letter that appeared in the most of those words was the letter 'e' at 6,666. Note that I didn't count total occurrences of 'e' but only 1 per word. This equated to 69%. Now what I set out to do was map the different ways the letter 'e' appeared across those 6,666 words. In the word 'eclectic' it appears in positions 1 and 4 where as in 'envelope' it appears in positions 1, 4 and 8. After guessing and seeing which positions were filled in, I could even eliminate words with letter 'e' even if they shared the common position because they didn't share all positions. That last part (all positions) was the piece I hadn't considered in my previous exercise. So here is the mind blowing part. The most common set of positions across the 6,666 words with the letter 'e' still had less than 1,600 possible words. This means that by selecting the letter 'e' (69% chance of being right) I will reduce the candidate list from 9,638 to less than 1,600 (and probably a lot further). It seemed pointless then to come up with some weights for determining letter based on probability of being correct and degree to which the candidate list is reduced because the "dumb" method was still doing a superb job.

      I do have one last final revision to make. I choose the letter that appears in the most candidate words but I don't break ties. Currently it is the result of sort. I plan to add total count as a secondary tie breaking condition to see if that improves results. I should post something later today.

      Cheers - L~R

        I could even eliminate words with letter 'e' even if they shared the common position because they didn't share all positions.

        Ah, that is what I have left to implement. The method I initially used is the exact same as the one you use besides the 'all positions' part, which is why yours works better.

        I don't mind occasionally having to reinvent a wheel; I don't even mind using someone's reinvented wheel occasionally. But it helps a lot if it is symmetric, contains no fewer than ten sides, and has the axle centered. I do tire of trapezoidal wheels with offset axles. --Joseph Newcomer

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://779783]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (16)
As of 2014-07-10 17:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (214 votes), past polls