One idea I had this morning but did not try out was searching the tree in both directions. Ieronim apparently had the same idea and indicated this is called a bi-directional search. I didn't find the implementation too difficult:
#!/usr/bin/perl
use strict;
use warnings;
use Storable;
my ($src, $tgt) = @ARGV;
die "Usage: $0 <src> <tgt>" if ! defined $src || ! defined $tgt;
die "The <src> and <tgt> must be same length" if length($src) != lengt
+h($tgt);
my $db = retrieve('dictionary.db');
my $path = find_path($src, $tgt, $db->{length($tgt)});
print "$path\n";
sub find_path {
my ($src, $tgt, $list, $search) = @_;
for my $pos (qw/src tgt/) {
my $dir = $pos eq 'src' ? $src : $tgt;
my $opp = $pos eq 'src' ? 'tgt' : 'src';
if (! defined $search->{$pos}{work}) {
for (@{$list->{$dir}}) {
push @{$search->{$pos}{work}}, {key => $_, path => "$d
+ir-$_"};
$search->{$pos}{term}{$_} = $search->{$pos}{work}[-1];
}
$search->{$pos}{term}{$dir} = {key => $dir, path => $dir};
}
my ($work, $next) = ($search->{$pos}{work}, []);
while (@$work) {
my $node = shift @$work;
my ($word, $path) = @{$node}{qw/key path/};
next if $search->{$pos}{seen}{$word}++;
if ($search->{$opp}{term}{$word}) {
my @cur_path = split /-/, $path;
my @con_path = split /-/, $search->{$opp}{term}{$word}
+{path};
return $pos eq 'tgt'
? join '-', @con_path, @cur_path[reverse 0 .. $#cu
+r_path - 1]
: join '-', @cur_path, @con_path[reverse 0 .. $#co
+n_path - 1];
}
for (@{$list->{$word}}) {
push @$next, {key => $_, path => "$path-$_"};
$search->{$pos}{term}{$_} = $next->[-1];
}
}
$search->{$pos}{work} = $next;
}
return 'path not found' if ! @{$search->{src}{work}} || ! @{$searc
+h->{tgt}{work}};
return find_path($src, $tgt, $list, $search);
}
I have
Benchmarked the results to determine if it is indeed faster. The results are as follows:
BFS_1way 5.93/s -- -64% -93%
BFS_2way 16.5/s 179% -- -79%
Ieronim_2way 79.2/s 1235% 379% --
Switching to bi-directional more than doubled the speed but didn't catch
Ieronim once he started using the precompiled datastructure. I guess I need to figure out find out how transform() differs from my BFS.
Update (2006-07-03): After re-writing Ieronim's code I finally discovered what the difference was. In a nutshell, I pull at item off the work queue and test to see if it connects a path. If it does not, I add every item it does connect with to the work queue. Switching where I test for a connection to before they are added to the queue instead of after they are taken off solves the mystery.
This is not reflected in any of the benchmarks because I believe this dead horse sufficiently beaten.