note
Ieronim
I rewrote my script using the pre-caching of word relationships; saying simply, i rewrote it using your 'dictionary.db' datastructure. I'll post the updated variant to my node [id://558123]
Here is the benchmarking script comparing speed of your <code>find_path</code> and my modified <code>transform</code> functions:
<readmore>
<code>
#!/usr/bin/perl
#ver 2.00
use warnings;
use strict;
use Storable;
use Text::LevenshteinXS 'distance';
use Benchmark qw':all';
my $dict = '2of12.txt';
die <<HELP unless @ARGV == 2;
usage: transform.pl <word1> <word2>
The program finds a way from one word to other, like this:
% transform.pl love shit
love-lose-lost-loot-soot-shot-shit
HELP
our ($left, $right) = @ARGV[0,1];
for ($left, $right) {
$_ = lc;
}
die "the length of given words is not equal!\n" if length($left) != length $right;
my $db = -e 'dictionary.db' ? retrieve('dictionary.db') : build_db();
our $list = $db->{length($left)};
eval {
printway([transform($left, $right, $list)]);
print find_path($left, $right, $list);
print "\n\n";
1;
} or print $@;
cmpthese( 100, {
'find_path' => 'my $path = find_path($left, $right, $list)',
'transform' => 'my $path = transform($left, $right, $list)',
} );
sub find_path {
my ($src, $tgt, $list, $seen, $work) = @_;
@$work = map {key => $_ => path => "$src->$_"}, @{$list->{$src}} if ! defined $work;
my $next = [];
for (@$work) {
my ($word, $path) = @{$_}{qw/key path/};
next if $seen->{$word}++;
return $path if $word eq $tgt;
push @$next, map {key => $_, path => "$path->$_"}, @{$list->{$word}};
}
return find_path($src, $tgt, $list, $seen, $next) if @$next;
return 'path not found';
}
sub transform {
my $left = shift;
my $right = shift;
my %list = %{+shift};
my (@left, %left, @right, %right); # @left and @right- arrays containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, fie] ...)
# %left and %right - indices containing word offsets in arrays @left and @right
$left[0] = [$left];
$right[0] = [$right];
$left{$left} = 0;
$right{$right} = 0;
my $leftstart = 0;
my $rightstart = 0;
my @way;
my (%leftstarts, %rightstarts);
SEARCH:
for (;;) {
my @left_ids = $leftstart..$#left; # choose array of indices of new words
$leftstart = $#left;
die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$leftstart}++ >2; # finish search if the way could not be found
for my $id (@left_ids) { # come through all new words
my @prefix = @{$left[$id]};
my $searched = pop @prefix;
push @prefix, $id;
foreach my $word (@{$list{$searched}}) {
next if $left{$word}; # skip words which are already in the tree
push @left, [@prefix, $word];
$left{$word} = $#left; # add new word to array and index
#print join " ", @{$left[-1]}, "\n"; #debugging
if ( defined(my $r_id = $right{$word}) ) { # and check if the word appears in right index. if yes...
my @end = reverse(print_rel($r_id, \@right));
shift @end;
@way = (print_rel($#left, \@left), @end); # build the way between the words
last SEARCH; # and finish the search
}
}
}
my @right_ids = $rightstart..$#right; # all the same :) the tree is build from both ends to speed up the process
$rightstart = $#right;
die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$rightstart}++ > 2;
for my $id (@right_ids) { # build right relational table
my @prefix = @{$right[$id]};
my $searched = pop @prefix;
push @prefix, $id;
foreach my $word (@{$list{$searched}}) {
next if $right{$word};
push @right, [@prefix, $word];
$right{$word} = $#right;
# print join " ", @{$right[-1]}, "\n"; #debugging
if ( defined(my $l_id = $left{$word}) ) {
my @end = reverse print_rel($#right, \@right);
shift @end;
@way = (print_rel($l_id, \@left), @end);
last SEARCH;
}
}
}
}
return @way;
}
sub print_rel {
my $id = shift;
my $ary = shift;
my @line;
my @rel = @{$ary->[$id]};
push @line, (pop @rel);
foreach my $ref_id (reverse @rel) {
unshift @line, $ary->[$ref_id]->[-1];
}
return wantarray ? @line : join "\n", @line, "";
}
sub printway {
my @way = @{+shift};
print join "-", @way;
print "\n";
}
sub build_db {
open (my $dict, '<', '2of12.txt') or die "Unable to open '2of12.txt' for reading: $!";
my ($db, %data);
while (<$dict>) {
chomp;
push @{$data{length()}}, $_;
}
for my $len (keys %data) {
my $end = $#{$data{$len}};
for my $i (0 .. $end - 1) {
my $word = $data{$len}[$i];
for my $j ($i + 1 .. $end) {
my $test = $data{$len}[$j];
if (distance($word, $test) == 1) {
push @{$db->{$len}{$word}}, $test;
push @{$db->{$len}{$test}}, $word;
}
}
}
}
store $db, 'dictionary.db';
return retrieve('dictionary.db');
}
</code>
</readmore>
And here is the result of it's execution:
<code>
love-lose-lost-loot-soot-shot-shit
love->lore->sore->sort->soot->shot->shit
Rate find_path transform
find_path 6.76/s -- -94%
transform 108/s 1505% --
</code>
This result illustrates that [wp://bidirectional search] is in general much faster than [wp://breadth-first search]. But your function looks much prettier and takes only 12 lines of code :)
558342
558342