Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Play and win the word morph game with the help of Perl :)

by Ieronim (Friar)
on Jun 28, 2006 at 19:09 UTC ( #558123=CUFP: print w/ replies, xml ) Need Help??

There is a popular word game caled "Word Morph". In this game you need to go from one word to another by modifying one letter in each word to form a new word. for example (head to foot):
HEAD bead beat boat boot FOOT
One of my English-speaking friends told me about this game. As my English is quite bad, I could not play this game fairly; but i wrote a perl script to solve the problem :)

This command-line tool finds a shortest way from one word to another using the given dictionary. I used the 2of12 dictonary from the 12Dicts project (http://wordlist.sourceforge.net), but any newline-character-delimited wordlist of any language can be used.

I don't know how 'cool' is this usage, but i mean that it is quite interesting :)

It is of course not ideal, so I am open for any suggestions :)

#!/usr/bin/perl #ver 1.02 use warnings; use strict; my $dict = '2of12.txt'; die <<HELP unless @ARGV == 2; usage: transform.pl <word1> <word2> The program finds a way from one word to other, like this: % transform.pl love shit love-lose-lost-loot-soot-shot-shit HELP my ($left, $right) = @ARGV[0,1]; for ($left, $right) { $_ = lc; } die "the length of given words is not equal!\n" if length($left) != le +ngth $right; open DICT, $dict or die "Cannot open dictionary $dict: $!"; my @words; while (<DICT>) { chomp; push @words, $_ if length == length $left; } eval { my @ways = ([transform($left, $right, \@words)], [reverse transfor +m($right, $left, \@words)]); if (@{$ways[0]} != @{$ways[1]}) { printway( @{$ways[0]} > @{$ways[1]} ? $ways[0] : $ways[1] ); } elsif (grep {$ways[0]->[$_] ne $ways[1]->[$_]} (0..(scalar(@{$ways +[0]}) - 1) )) { printway($ways[0]); printway($ways[1]); } else {printway($ways[0])} 1; } or print $@; sub transform { my $left = shift; my $right = shift; my @words = @{+shift}; my (@left, %left, @right, %right); # @left and @right- arrays + containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, f +ie] ...) # %left and %right - indic +es containing word offsets in arrays @left and @right $left[0] = [$left]; $right[0] = [$right]; $left{$left} = 0; $right{$right} = 0; my $leftstart = 0; my $rightstart = 0; my @way; my (%leftstarts, %rightstarts); SEARCH: for (;;) { my @left_ids = $leftstart..$#left; + # choose array of indices of new words $leftstart = $#left; die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$left +start}++ >2; # finish search if the way could not be found for my $id (@left_ids) { + # come through all new words my @prefix = @{$left[$id]}; my @patterns = wordpattern(pop @prefix); + # build patterns to find related words: foo -> (/^.oo$/,/^f.o$/, /^ +fo.$/) push @prefix, $id; foreach my $word (@words) { next if $left{$word}; + # skip words which are already in the tree if (scalar grep {$word =~ /$_/} @patterns) { + # if matched... push @left, [@prefix, $word]; $left{$word} = $#left; + # add new word to array and index #print join " ", @{$left[-1]}, "\n"; #debugging if ( defined(my $r_id = $right{$word}) ) { + # and check if the word appears in right index. if yes... my @end = reverse(print_rel($r_id, \@right)); shift @end; @way = (print_rel($#left, \@left), @end); + # build the way between the words last SEARCH; + # and finish the search } } } } my @right_ids = $rightstart..$#right; + # all the same :) the tree is build from both ends to speed up the +process $rightstart = $#right; die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$ri +ghtstart}++ > 2; for my $id (@right_ids) { # build right relational table my @prefix = @{$right[$id]}; my @patterns = wordpattern(pop @prefix); push @prefix, $id; foreach my $word (@words) { next if $right{$word}; if (scalar grep {$word =~ /$_/} @patterns) { push @right, [@prefix, $word]; $right{$word} = $#right; # print join " ", @{$right[-1]}, "\n"; #debugging if ( defined(my $l_id = $left{$word}) ) { my @end = reverse print_rel($#right, \@right); shift @end; @way = (print_rel($l_id, \@left), @end); last SEARCH; } } } } } return @way; } sub wordpattern { my $word = shift; my @patterns; for my $i (0..(length($word)-1)) { substr((my $pat = $word), $i, 1, '.'); push @patterns, qr/^$pat$/; } return @patterns; } sub print_rel { my $id = shift; my $ary = shift; my @line; my @rel = @{$ary->[$id]}; push @line, (pop @rel); foreach my $ref_id (reverse @rel) { unshift @line, $ary->[$ref_id]->[-1]; } return wantarray ? @line : join "\n", @line, ""; } sub printway { my @way = @{+shift}; print join "-", @way; print "\n"; }

UPDATE:
The dicussion with Limbic~Region lead to an improved variant of my script at first time it is called it builds a helper data structure based on the given dictionary, stores it using Storable on the hard disk and loads its to memory at the next calls. So the first execution of the script lasts a very long time (about 5-8 minutes on a usual machine), but the next calls take less than half a second.

Its implementation uses modules Storable and Text::LevenshteinXS while my first variant was completely standalone. But I mean the faster variant is better, so I post it here.

#!/usr/bin/perl #ver 2.00 use warnings; use strict; use Storable; use Text::LevenshteinXS 'distance'; my $dict = '2of12.txt'; die <<HELP unless @ARGV == 2; usage: transform.pl <word1> <word2> The program finds a way from one word to other, like this: % transform.pl love shit love-lose-lost-loot-soot-shot-shit HELP my ($left, $right) = @ARGV[0,1]; for ($left, $right) { $_ = lc; } die "the length of given words is not equal!\n" if length($left) != le +ngth $right; my $db = -e 'dictionary.db' ? retrieve('dictionary.db') : build_db(); my $len = length $left; foreach my $word ($left, $right) { if (!$db->{$len}{$word}) { foreach my $test (keys %{$db->{$len}}) { if (distance($word, $test) == 1) { push @{$db->{$len}{$word}}, $test; push @{$db->{$len}{$test}}, $word; } } } } my $list = $db->{length($left)}; eval { printway([transform($left, $right, $list)]); 1; } or print $@; sub transform { my $left = shift; my $right = shift; my $list = shift; my (@left, %left, @right, %right); # @left and @right- arrays + containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, f +ie] ...) # %left and %right - indic +es containing word offsets in arrays @left and @right $left[0] = [$left]; $right[0] = [$right]; $left{$left} = 0; $right{$right} = 0; my $leftstart = 0; my $rightstart = 0; my @way; my (%leftstarts, %rightstarts); SEARCH: for (;;) { my @left_ids = $leftstart..$#left; + # choose array of indices of new words $leftstart = $#left; die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$left +start}++ >2; # finish search if the way could not be found for my $id (@left_ids) { + # come through all new words my @prefix = @{$left[$id]}; my $searched = pop @prefix; push @prefix, $id; foreach my $word (@{$list->{$searched}}) { next if $left{$word}; + # skip words which are already in the tree push @left, [@prefix, $word]; $left{$word} = $#left; # +add new word to array and index #print join " ", @{$left[-1]}, "\n"; #debugging if ( defined(my $r_id = $right{$word}) ) { # +and check if the word appears in right index. if yes... my @end = reverse(print_rel($r_id, \@right)); shift @end; @way = (print_rel($#left, \@left), @end); # +build the way between the words last SEARCH; # +and finish the search } } } my @right_ids = $rightstart..$#right; + # all the same :) the tree is build from both ends to speed up the +process $rightstart = $#right; die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$ri +ghtstart}++ > 2; for my $id (@right_ids) { # build right relational table my @prefix = @{$right[$id]}; my $searched = pop @prefix; push @prefix, $id; foreach my $word (@{$list->{$searched}}) { next if $right{$word}; push @right, [@prefix, $word]; $right{$word} = $#right; # print join " ", @{$right[-1]}, "\n"; #debugging if ( defined(my $l_id = $left{$word}) ) { my @end = reverse print_rel($#right, \@right); shift @end; @way = (print_rel($l_id, \@left), @end); last SEARCH; } } } } return @way; } sub print_rel { my $id = shift; my $ary = shift; my @line; my @rel = @{$ary->[$id]}; push @line, (pop @rel); foreach my $ref_id (reverse @rel) { unshift @line, $ary->[$ref_id]->[-1]; } return wantarray ? @line : join "\n", @line, ""; } sub printway { my @way = @{+shift}; print join "-", @way; print "\n"; } sub build_db { #thanks to Limbic~Region, http://p +erlmonks.org/index.pl?node_id=180961 open (my $dict, '<', '2of12.txt') or die "Unable to open '2of12.tx +t' for reading: $!"; my ($db, %data); while (<$dict>) { chomp; push @{$data{length()}}, $_; } for my $len (keys %data) { my $end = $#{$data{$len}}; for my $i (0 .. $end - 1) { my $word = $data{$len}[$i]; for my $j ($i + 1 .. $end) { my $test = $data{$len}[$j]; if (distance($word, $test) == 1) { push @{$db->{$len}{$word}}, $test; push @{$db->{$len}{$test}}, $word; } } } } store $db, 'dictionary.db'; return retrieve('dictionary.db'); }
Any further enchancements are welcome :)

Comment on Play and win the word morph game with the help of Perl :)
Select or Download Code
Re: Play and win the word morph game with the help of Perl :)
by Limbic~Region (Chancellor) on Jun 28, 2006 at 20:35 UTC
    Ieronim,
    I first posted untested code because I was about to leave work and DBM::Deep takes forever to build the database. I have subsequently updated it to use a recursive BFS, optimized the build_db() routine, and verified it works correctly. I hope you enjoy though it is much slower than I hoped due to requiring a BFS to find the shortest path.
    #!/usr/bin/perl use strict; use warnings; use DBM::Deep; use Text::LevenshteinXS 'distance'; my $db = DBM::Deep->new('dict.db'); build_db($db) if ! scalar keys %$db; my ($src, $tgt) = @ARGV; # Error handle defined, length, exist in $db and distance() > 1 my $len = length($tgt); my $list = $db->{$len}; my $path = find_path($src, $tgt, $list); print "$path\n"; sub find_path { my ($src, $tgt, $list, $seen, $work) = @_; @$work = map {key => $_ => path => "$src->$_"}, @{$list->{$src}} i +f ! defined $work; my $next = []; for (@$work) { my ($word, $path) = @{$_}{qw/key path/}; next if $seen->{$word}++; return $path if $word eq $tgt; push @$next, map {key => $_, path => "$path->$_"}, @{$list->{$ +word}}; } return find_path($src, $tgt, $list, $seen, $next) if @$next; return 'path not found'; } sub build_db { my $db = shift @_; open (my $dict, '<', '2of12.txt') or die "Unable to open '2of12.tx +t' for reading: $!"; my %data; while (<$dict>) { chomp; push @{$data{length()}}, $_; } for my $len (keys %data) { my $end = $#{$data{$len}}; for my $i (0 .. $end - 1) { my $word = $data{$len}[$i]; for my $j ($i + 1 .. $end) { my $test = $data{$len}[$j]; if (distance($word, $test) == 1) { push @{$db->{$len}{$word}}, $test; push @{$db->{$len}{$test}}, $word; } } } } }
    You also have a bug: try 'foot' and 'fool'

    Cheers - L~R

      I learned some new tricks from your code :)

      But i think that my way is better ;)
      The main advantage of my solution over yours is the possibility of work with given words not included in the dictionary: e.g. the script can find find a path from 'phil' to 'kate'.
      And the second advantage is speed my script works only with data stored in memory, and your uses the hard disk very much. So, as i have measured, my transform function is about 3 times faster than your findpath.

      Bug fixed :)

      Thank you :)

        Ieronim,
        I am glad you learned something from my code. It was intended to be concise while also being straight forward. DBM::Deep is not currently designed to be fast as it is written in pure perl. If you want speed - try the version below which uses Storable. I believe it outperforms yours though I have not Benchmarked it.

        I admit that I have not had a chance to disect your code but I am interested in how you build bridges for words not in the dictionary. If it is only the endpoints, then modifying my code to do the same would be trivial. I am going to spend some time disecting your code and may update the version below if all you are handling is end-points.

        Cheers - L~R

Re: Play and win the word morph game with the help of Perl :)
by jwkrahn (Monsignor) on Jun 28, 2006 at 21:01 UTC
    die <<HELP unless @ARGV == 2; usage: transform.pl <word1> <word2> The program finds a way from one word to other, like this: % transform.pl love shit love-lose-lost-loot-soot-shot-shit HELP my $left = shift(@ARGV) || 'love'; my $right = shift(@ARGV) || 'shit';
    Because you die if there are not two arguments on the command line there is no way that $left and $right will ever be assigned the values 'love' and 'shit'.
    for ($left, $right) { $_ = lc; tr/A-Za-z//cd; }
    You want to keep A-Z characters in a lc()ed string? What about non-ASCII lower case letters?
      Thanks.
      I modified the code according to your recommendations i removed useless default values for $left and $right (theoreticaly the user can enter transform.pl 'test' '', but in this case he will get a VERY unexpected result :)) and removed the tr/// string.
        I never said that you should remove the tr/// operator, I was just wondering if you understood exactly what it was doing, or not doing? For instance, after $_ = lc; there should be no upper case letters in $_ so including A-Z in the list makes little sense.
Re: Play and win the word morph game with the help of Perl :)
by jwkrahn (Monsignor) on Jun 29, 2006 at 09:02 UTC
    Here are a couple of minor optimisations:

    1)  In the transform sub you copy the contents of the @words array unnecessarily. Change:

    50 my @words = @{+shift}; 72 foreach my $word (@words) { 95 foreach my $word (@words) {
    To:
    50 my $words = shift; 72 foreach my $word (@$words) { 95 foreach my $word (@$words) {
    2)  The four argument form of substr is more efficient then assigning to the three argument form of substr. Change:
    118 substr((my $pat = $word), $i, 1) = '.';
    To:
    118 substr my $pat = $word, $i, 1, '.';

    HTH

      (2) accepted, (1) - not accepted.

      (1) makes no measurable differece of speed, as the array is copied only once; i left my variant because it simply looks cleaner for me :)

Re: Play and win the word morph game with the help of Perl :)
by Ieronim (Friar) on Jun 29, 2006 at 15:51 UTC
    I was asked for the explanation of how my algorithm works. I found in Wikipedia that it's called bidirectional search but when I wrote the script i didn't know its name :) The idea is better explained in WP than I can do it here :)

    I run two simultaneous searches: one forward from the source word, and one backward from the target word, and stop when the two trees meet.

      Ieronim,
      This is the very idea (I didn't know it had a name either) I had this morning before coming into work. After converting DBM::Deep over to Storable though, I found all the speed boost I needed and didn't explore it. I know you said in a /msg that you will try to play with a variant using the precompiled datastructure - so will I.

      Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://558123]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (15)
As of 2014-07-29 13:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (217 votes), past polls