There is a popular word game caled " Word Morph". In this game you need to go from one word to another by modifying one letter in each word to form a new word. for example (head to foot):
HEAD
bead
beat
boat
boot
FOOT
One of my English-speaking friends told me about this game. As my English is quite bad, I could not play this game fairly; but i wrote a perl script to solve the problem :)
This command-line tool finds a shortest way from one word to another using the given dictionary. I used the 2of12 dictonary from the 12Dicts project (http://wordlist.sourceforge.net), but any newline-character-delimited wordlist of any language can be used.
I don't know how 'cool' is this usage, but i mean that it is quite interesting :)
It is of course not ideal, so I am open for any suggestions :)
#!/usr/bin/perl
#ver 1.02
use warnings;
use strict;
my $dict = '2of12.txt';
die <<HELP unless @ARGV == 2;
usage: transform.pl <word1> <word2>
The program finds a way from one word to other, like this:
% transform.pl love shit
love-lose-lost-loot-soot-shot-shit
HELP
my ($left, $right) = @ARGV[0,1];
for ($left, $right) {
$_ = lc;
}
die "the length of given words is not equal!\n" if length($left) != le
+ngth $right;
open DICT, $dict or die "Cannot open dictionary $dict: $!";
my @words;
while (<DICT>) {
chomp;
push @words, $_ if length == length $left;
}
eval {
my @ways = ([transform($left, $right, \@words)], [reverse transfor
+m($right, $left, \@words)]);
if (@{$ways[0]} != @{$ways[1]}) {
printway( @{$ways[0]} > @{$ways[1]} ? $ways[0] : $ways[1] );
}
elsif (grep {$ways[0]->[$_] ne $ways[1]->[$_]} (0..(scalar(@{$ways
+[0]}) - 1) )) {
printway($ways[0]);
printway($ways[1]);
}
else {printway($ways[0])}
1;
} or print $@;
sub transform {
my $left = shift;
my $right = shift;
my @words = @{+shift};
my (@left, %left, @right, %right); # @left and @right- arrays
+ containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, f
+ie] ...)
# %left and %right - indic
+es containing word offsets in arrays @left and @right
$left[0] = [$left];
$right[0] = [$right];
$left{$left} = 0;
$right{$right} = 0;
my $leftstart = 0;
my $rightstart = 0;
my @way;
my (%leftstarts, %rightstarts);
SEARCH:
for (;;) {
my @left_ids = $leftstart..$#left;
+ # choose array of indices of new words
$leftstart = $#left;
die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$left
+start}++ >2; # finish search if the way could not be found
for my $id (@left_ids) {
+ # come through all new words
my @prefix = @{$left[$id]};
my @patterns = wordpattern(pop @prefix);
+ # build patterns to find related words: foo -> (/^.oo$/,/^f.o$/, /^
+fo.$/)
push @prefix, $id;
foreach my $word (@words) {
next if $left{$word};
+ # skip words which are already in the tree
if (scalar grep {$word =~ /$_/} @patterns) {
+ # if matched...
push @left, [@prefix, $word];
$left{$word} = $#left;
+ # add new word to array and index
#print join " ", @{$left[-1]}, "\n"; #debugging
if ( defined(my $r_id = $right{$word}) ) {
+ # and check if the word appears in right index. if yes...
my @end = reverse(print_rel($r_id, \@right));
shift @end;
@way = (print_rel($#left, \@left), @end);
+ # build the way between the words
last SEARCH;
+ # and finish the search
}
}
}
}
my @right_ids = $rightstart..$#right;
+ # all the same :) the tree is build from both ends to speed up the
+process
$rightstart = $#right;
die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$ri
+ghtstart}++ > 2;
for my $id (@right_ids) { # build right relational table
my @prefix = @{$right[$id]};
my @patterns = wordpattern(pop @prefix);
push @prefix, $id;
foreach my $word (@words) {
next if $right{$word};
if (scalar grep {$word =~ /$_/} @patterns) {
push @right, [@prefix, $word];
$right{$word} = $#right;
# print join " ", @{$right[-1]}, "\n"; #debugging
if ( defined(my $l_id = $left{$word}) ) {
my @end = reverse print_rel($#right, \@right);
shift @end;
@way = (print_rel($l_id, \@left), @end);
last SEARCH;
}
}
}
}
}
return @way;
}
sub wordpattern {
my $word = shift;
my @patterns;
for my $i (0..(length($word)-1)) {
substr((my $pat = $word), $i, 1, '.');
push @patterns, qr/^$pat$/;
}
return @patterns;
}
sub print_rel {
my $id = shift;
my $ary = shift;
my @line;
my @rel = @{$ary->[$id]};
push @line, (pop @rel);
foreach my $ref_id (reverse @rel) {
unshift @line, $ary->[$ref_id]->[-1];
}
return wantarray ? @line : join "\n", @line, "";
}
sub printway {
my @way = @{+shift};
print join "-", @way;
print "\n";
}
UPDATE:
The dicussion with Limbic~Region lead to an improved variant of my script — at first time it is called it builds a helper data structure based on the given dictionary, stores it using Storable on the hard disk and loads its to memory at the next calls. So the first execution of the script lasts a very long time (about 5-8 minutes on a usual machine), but the next calls take less than half a second.
Its implementation uses modules Storable and Text::LevenshteinXS while my first variant was completely standalone. But I mean the faster variant is better, so I post it here.
#!/usr/bin/perl
#ver 2.00
use warnings;
use strict;
use Storable;
use Text::LevenshteinXS 'distance';
my $dict = '2of12.txt';
die <<HELP unless @ARGV == 2;
usage: transform.pl <word1> <word2>
The program finds a way from one word to other, like this:
% transform.pl love shit
love-lose-lost-loot-soot-shot-shit
HELP
my ($left, $right) = @ARGV[0,1];
for ($left, $right) {
$_ = lc;
}
die "the length of given words is not equal!\n" if length($left) != le
+ngth $right;
my $db = -e 'dictionary.db' ? retrieve('dictionary.db') : build_db();
my $len = length $left;
foreach my $word ($left, $right) {
if (!$db->{$len}{$word}) {
foreach my $test (keys %{$db->{$len}}) {
if (distance($word, $test) == 1) {
push @{$db->{$len}{$word}}, $test;
push @{$db->{$len}{$test}}, $word;
}
}
}
}
my $list = $db->{length($left)};
eval {
printway([transform($left, $right, $list)]);
1;
} or print $@;
sub transform {
my $left = shift;
my $right = shift;
my $list = shift;
my (@left, %left, @right, %right); # @left and @right- arrays
+ containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, f
+ie] ...)
# %left and %right - indic
+es containing word offsets in arrays @left and @right
$left[0] = [$left];
$right[0] = [$right];
$left{$left} = 0;
$right{$right} = 0;
my $leftstart = 0;
my $rightstart = 0;
my @way;
my (%leftstarts, %rightstarts);
SEARCH:
for (;;) {
my @left_ids = $leftstart..$#left;
+ # choose array of indices of new words
$leftstart = $#left;
die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$left
+start}++ >2; # finish search if the way could not be found
for my $id (@left_ids) {
+ # come through all new words
my @prefix = @{$left[$id]};
my $searched = pop @prefix;
push @prefix, $id;
foreach my $word (@{$list->{$searched}}) {
next if $left{$word};
+ # skip words which are already in the tree
push @left, [@prefix, $word];
$left{$word} = $#left; #
+add new word to array and index
#print join " ", @{$left[-1]}, "\n"; #debugging
if ( defined(my $r_id = $right{$word}) ) { #
+and check if the word appears in right index. if yes...
my @end = reverse(print_rel($r_id, \@right));
shift @end;
@way = (print_rel($#left, \@left), @end); #
+build the way between the words
last SEARCH; #
+and finish the search
}
}
}
my @right_ids = $rightstart..$#right;
+ # all the same :) the tree is build from both ends to speed up the
+process
$rightstart = $#right;
die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$ri
+ghtstart}++ > 2;
for my $id (@right_ids) { # build right relational table
my @prefix = @{$right[$id]};
my $searched = pop @prefix;
push @prefix, $id;
foreach my $word (@{$list->{$searched}}) {
next if $right{$word};
push @right, [@prefix, $word];
$right{$word} = $#right;
# print join " ", @{$right[-1]}, "\n"; #debugging
if ( defined(my $l_id = $left{$word}) ) {
my @end = reverse print_rel($#right, \@right);
shift @end;
@way = (print_rel($l_id, \@left), @end);
last SEARCH;
}
}
}
}
return @way;
}
sub print_rel {
my $id = shift;
my $ary = shift;
my @line;
my @rel = @{$ary->[$id]};
push @line, (pop @rel);
foreach my $ref_id (reverse @rel) {
unshift @line, $ary->[$ref_id]->[-1];
}
return wantarray ? @line : join "\n", @line, "";
}
sub printway {
my @way = @{+shift};
print join "-", @way;
print "\n";
}
sub build_db { #thanks to Limbic~Region, http://p
+erlmonks.org/index.pl?node_id=180961
open (my $dict, '<', '2of12.txt') or die "Unable to open '2of12.tx
+t' for reading: $!";
my ($db, %data);
while (<$dict>) {
chomp;
push @{$data{length()}}, $_;
}
for my $len (keys %data) {
my $end = $#{$data{$len}};
for my $i (0 .. $end - 1) {
my $word = $data{$len}[$i];
for my $j ($i + 1 .. $end) {
my $test = $data{$len}[$j];
if (distance($word, $test) == 1) {
push @{$db->{$len}{$word}}, $test;
push @{$db->{$len}{$test}}, $word;
}
}
}
}
store $db, 'dictionary.db';
return retrieve('dictionary.db');
}
Any further enchancements are welcome :)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|
|