Re: A Better Word Morph Builder

One idea I had this morning but did not try out was searching the tree in both directions. Ieronim apparently had the same idea and indicated this is called a bi-directional search. I didn't find the implementation too difficult:

#!/usr/bin/perl
use strict;
use warnings;
use Storable;

my ($src, $tgt) = @ARGV;
die "Usage: $0 <src> <tgt>" if ! defined $src || ! defined $tgt;
die "The <src> and <tgt> must be same length" if length($src) != lengt
+h($tgt);

my $db = retrieve('dictionary.db');
my $path = find_path($src, $tgt, $db->{length($tgt)});
print "$path\n";

sub find_path {
    my ($src, $tgt, $list, $search) = @_;

    for my $pos (qw/src tgt/) {
        my $dir = $pos eq 'src' ? $src  : $tgt;
        my $opp = $pos eq 'src' ? 'tgt' : 'src';

        if (! defined $search->{$pos}{work}) {
            for (@{$list->{$dir}}) {
                push @{$search->{$pos}{work}}, {key => $_, path => "$d
+ir-$_"};
                $search->{$pos}{term}{$_} = $search->{$pos}{work}[-1];
            }
            $search->{$pos}{term}{$dir} = {key => $dir, path => $dir};
        }

        my ($work, $next) = ($search->{$pos}{work}, []);
       
        while (@$work) {
            my $node = shift @$work;
            my ($word, $path) = @{$node}{qw/key path/};
            next if $search->{$pos}{seen}{$word}++;
            if ($search->{$opp}{term}{$word}) {
                my @cur_path = split /-/, $path;
                my @con_path = split /-/, $search->{$opp}{term}{$word}
+{path};
                return $pos eq 'tgt'
                    ? join '-', @con_path, @cur_path[reverse 0 .. $#cu
+r_path - 1]
                    : join '-', @cur_path, @con_path[reverse 0 .. $#co
+n_path - 1];
            }
            for (@{$list->{$word}}) {
                push @$next, {key => $_, path => "$path-$_"};
                $search->{$pos}{term}{$_} = $next->[-1];
            }            
        }
        $search->{$pos}{work} = $next;
    }
    return 'path not found' if ! @{$search->{src}{work}} || ! @{$searc
+h->{tgt}{work}};
    return find_path($src, $tgt, $list, $search);
}
[download]

I have Benchmarked the results to determine if it is indeed faster. The results are as follows:

BFS_1way      5.93/s         --       -64%       -93%
BFS_2way      16.5/s       179%         --       -79%
Ieronim_2way  79.2/s      1235%       379%         --
[download]

Switching to bi-directional more than doubled the speed but didn't catch Ieronim once he started using the precompiled datastructure. I guess I need to figure out find out how transform() differs from my BFS.

Update (2006-07-03): After re-writing Ieronim's code I finally discovered what the difference was. In a nutshell, I pull at item off the work queue and test to see if it connects a path. If it does not, I add every item it does connect with to the work queue. Switching where I test for a connection to before they are added to the queue instead of after they are taken off solves the mystery.
This is not reflected in any of the benchmarks because I believe this dead horse sufficiently beaten.

Cheers - L~R

Comment on Re: A Better Word Morph Builder Select or Download Code

Replies are listed 'Best First'.
Re^2: A Better Word Morph Builder by Ieronim (Friar) on Jun 29, 2006 at 18:51 UTC
i modified transform() subroutine acording to your recommendation — i moved from dereferecing $list hashref to using it as a reference. The speed near doubled :) New benchmark results: `Rate find_path find_path2 transform find_path 6.52/s -- -72% -96% find_path2 23.7/s 263% -- -87% transform 183/s 2702% 673% --` [download]	[reply] [d/l]
Re^3: A Better Word Morph Builder by Limbic~Region (Chancellor) on Jun 30, 2006 at 01:12 UTC
Ieronim, I give. While I understand your admittedly fast code - it is extremely hard to follow. The cleanest code I could come up with using a bi-directional BFS is as fast as your transform() before following my recommendation. That is to say, it is about half as fast as your current version: use constant SRC => 0; use constant TGT => 1; sub find_path2 { my ($src, $tgt, $list) = @_; my (@src_work, @tgt_work, %path); for my $dir (SRC, TGT) { my ($word, $work) = $dir == SRC ? ($src, \@src_work) : ($tgt, +\@tgt_work); $path{$word}[$dir] = -1; for (@{$list->{$word}}) { push @$work, $_; $path{$_}[$dir] = $word; } } while (1) { for my $dir (SRC, TGT) { my @next; my $work = $dir == SRC ? \@src_work : \@tgt_work; for my $word (@$work) { return build_path(\%path, $word) if $path{$word}[abs($ +dir - 1)]; for (@{$list->{$word}}) { next if $path{$_}[$dir]; push @next, $_; $path{$_}[$dir] = $word; } } @$work = @next; } return 'Path not found' if ! @src_work && ! @tgt_work; } } sub build_path { my ($tree, $node) = @_; my $path = "-$node"; for my $dir (SRC, TGT) { my $word = $tree->{$node}[$dir]; while ($word ne '-1') { $path = $dir == SRC ? "-$word$path" : "$path-$word"; $word = $tree->{$word}[$dir]; } } return substr($path, 1); } [download] Update (2006-07-03): Simply moving `return build_path(\%path, $word) if $path{$word}[abs($dir - 1)];` to the inner most for loop and s/word/_/ brings the performance much closer. See the update in this node for an explanation why. Cheers - L~R	[reply] [d/l] [select]
Re^3: A Better Word Morph Builder by Limbic~Region (Chancellor) on Jun 30, 2006 at 18:48 UTC
Ieronim, I have finished benchmarking the routines. I didn't include any from this thread or this one but you are welcome to. I did not penalize any routine for processing the dictionary. Additionally, if there were a number of small variations for a routine, I only included the fastest. `Rate limbic__2 solo____1 limbic__3 limbic__4 ieronim_1 lim +bic__1 limbic__2 6.48/s -- -45% -62% -90% -95% + -96% solo____1 11.8/s 82% -- -31% -82% -92% + -92% limbic__3 17.3/s 166% 46% -- -74% -88% + -89% limbic__4 67.5/s 941% 471% 291% -- -53% + -55% ieronim_1 143/s 2104% 1109% 728% 112% -- + -5% limbic__1 150/s 2218% 1171% 771% 123% 5% + --` [download] limbic__1..4 from nodes 1, 2, 3, 4 respectively ieronim_1 from this node (updated) solo____1 from this node Please note that the winner is just my re-write of Ieronim's code with a few extra bells and whistles. Since the benchmark is extremely long, I have put it in spoiler tags as well as readmore tags: Read more... (17 kB) Update (2006-07-03): These benchmarks do not reflect the realization I made concerning when to test if a path connection has been made that I noted elsewhere in this thread. Cheers - L~R	[reply] [d/l] [select]
Re^4: A Better Word Morph Builder by Ieronim (Friar) on Jun 30, 2006 at 19:10 UTC
The results of the same benchmark (!!) on my machine are here: `Rate limbic__2 solo____1 limbic__3 limbic__4 limbic__1 ier +onim_1 limbic__2 5.64/s -- -63% -77% -94% -97% + -97% solo____1 15.3/s 172% -- -37% -83% -92% + -92% limbic__3 24.2/s 329% 58% -- -74% -87% + -88% limbic__4 91.7/s 1528% 499% 279% -- -52% + -54% limbic__1 190/s 3267% 1140% 684% 107% -- + -5% ieronim_1 200/s 3449% 1207% 727% 118% 5% + --` [download] I can't explain the difference :))	[reply] [d/l]
Re^5: A Better Word Morph Builder by Solo (Deacon) on Jun 30, 2006 at 22:06 UTC
Re^3: A Better Word Morph Builder by Limbic~Region (Chancellor) on Jun 30, 2006 at 17:32 UTC
Ieronim, I have taken the best parts of all the versions and re-written them in what I believe to be relatively clean code. I have not yet benchmarked this but I expect it to be very close to your current best. Here is a list of features: Command line argument parsing Ability to handle non-dictionary words (configurable) Ability to handle absense of Text::LevenshteinXS (automatic) Ability to use different compiled databases (configurable) Ability to create new compiled databases (configurable) Speed and non-duplicated code Read more... (6 kB) Cheers - L~R	[reply] [d/l]


laziness, impatience, and hubris
	PerlMonks