Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: A Better Word Morph Builder

by Ieronim (Friar)
on Jun 29, 2006 at 16:46 UTC ( #558375=note: print w/ replies, xml ) Need Help??


in reply to A Better Word Morph Builder

I rewrote my script using the pre-caching of word relationships; saying simply, i rewrote it using your 'dictionary.db' datastructure. I'll post the updated variant to my node Play and win the word morph game with the help of Perl :) Here is the benchmarking script comparing speed of your find_path and my modified transform functions:

#!/usr/bin/perl #ver 2.00 use warnings; use strict; use Storable; use Text::LevenshteinXS 'distance'; use Benchmark qw':all'; my $dict = '2of12.txt'; die <<HELP unless @ARGV == 2; usage: transform.pl <word1> <word2> The program finds a way from one word to other, like this: % transform.pl love shit love-lose-lost-loot-soot-shot-shit HELP our ($left, $right) = @ARGV[0,1]; for ($left, $right) { $_ = lc; } die "the length of given words is not equal!\n" if length($left) != le +ngth $right; my $db = -e 'dictionary.db' ? retrieve('dictionary.db') : build_db(); our $list = $db->{length($left)}; eval { printway([transform($left, $right, $list)]); print find_path($left, $right, $list); print "\n\n"; 1; } or print $@; cmpthese( 100, { 'find_path' => 'my $path = find_path($left, $right, $list)', 'transform' => 'my $path = transform($left, $right, $list)', } ); sub find_path { my ($src, $tgt, $list, $seen, $work) = @_; @$work = map {key => $_ => path => "$src->$_"}, @{$list->{$src}} i +f ! defined $work; my $next = []; for (@$work) { my ($word, $path) = @{$_}{qw/key path/}; next if $seen->{$word}++; return $path if $word eq $tgt; push @$next, map {key => $_, path => "$path->$_"}, @{$list->{$ +word}}; } return find_path($src, $tgt, $list, $seen, $next) if @$next; return 'path not found'; } sub transform { my $left = shift; my $right = shift; my %list = %{+shift}; my (@left, %left, @right, %right); # @left and @right- arrays + containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, f +ie] ...) # %left and %right - indic +es containing word offsets in arrays @left and @right $left[0] = [$left]; $right[0] = [$right]; $left{$left} = 0; $right{$right} = 0; my $leftstart = 0; my $rightstart = 0; my @way; my (%leftstarts, %rightstarts); SEARCH: for (;;) { my @left_ids = $leftstart..$#left; + # choose array of indices of new words $leftstart = $#left; die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$left +start}++ >2; # finish search if the way could not be found for my $id (@left_ids) { + # come through all new words my @prefix = @{$left[$id]}; my $searched = pop @prefix; push @prefix, $id; foreach my $word (@{$list{$searched}}) { next if $left{$word}; + # skip words which are already in the tree push @left, [@prefix, $word]; $left{$word} = $#left; # +add new word to array and index #print join " ", @{$left[-1]}, "\n"; #debugging if ( defined(my $r_id = $right{$word}) ) { # +and check if the word appears in right index. if yes... my @end = reverse(print_rel($r_id, \@right)); shift @end; @way = (print_rel($#left, \@left), @end); # +build the way between the words last SEARCH; # +and finish the search } } } my @right_ids = $rightstart..$#right; + # all the same :) the tree is build from both ends to speed up the +process $rightstart = $#right; die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$ri +ghtstart}++ > 2; for my $id (@right_ids) { # build right relational table my @prefix = @{$right[$id]}; my $searched = pop @prefix; push @prefix, $id; foreach my $word (@{$list{$searched}}) { next if $right{$word}; push @right, [@prefix, $word]; $right{$word} = $#right; # print join " ", @{$right[-1]}, "\n"; #debugging if ( defined(my $l_id = $left{$word}) ) { my @end = reverse print_rel($#right, \@right); shift @end; @way = (print_rel($l_id, \@left), @end); last SEARCH; } } } } return @way; } sub print_rel { my $id = shift; my $ary = shift; my @line; my @rel = @{$ary->[$id]}; push @line, (pop @rel); foreach my $ref_id (reverse @rel) { unshift @line, $ary->[$ref_id]->[-1]; } return wantarray ? @line : join "\n", @line, ""; } sub printway { my @way = @{+shift}; print join "-", @way; print "\n"; } sub build_db { open (my $dict, '<', '2of12.txt') or die "Unable to open '2of12.tx +t' for reading: $!"; my ($db, %data); while (<$dict>) { chomp; push @{$data{length()}}, $_; } for my $len (keys %data) { my $end = $#{$data{$len}}; for my $i (0 .. $end - 1) { my $word = $data{$len}[$i]; for my $j ($i + 1 .. $end) { my $test = $data{$len}[$j]; if (distance($word, $test) == 1) { push @{$db->{$len}{$word}}, $test; push @{$db->{$len}{$test}}, $word; } } } } store $db, 'dictionary.db'; return retrieve('dictionary.db'); }
And here is the result of it's execution:
love-lose-lost-loot-soot-shot-shit love->lore->sore->sort->soot->shot->shit Rate find_path transform find_path 6.76/s -- -94% transform 108/s 1505% --
This result illustrates that bidirectional search is in general much faster than breadth-first search. But your function looks much prettier and takes only 12 lines of code :)


Comment on Re: A Better Word Morph Builder
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558375]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2014-08-30 12:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls