Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: A Better Word Morph Builder

by Solo (Deacon)
on Jun 29, 2006 at 19:44 UTC ( #558426=note: print w/replies, xml ) Need Help??

in reply to A Better Word Morph Builder

Well, it certainly takes a while to build the db the first time... (over 15 mins so far... maybe LevenshteinXS isn't quite right on Win?) Here's my offing that was written for clarity, rather than efficiency. Then again, maybe I'll say it's optimized for startup time, rather than solution time... who likes those long 'loading...' screens, anyway?

Update:I should have realized that I'm using the yawl-0.3.2 word list, and the time to build the db is probably affected by that!

use strict; use warnings; use List::Compare; use Benchmark qw( cmpthese ); my $lword = shift || 'love'; my $rword = shift || 'hate'; die unless length($rword) == length($lword); my $length = length($lword); my $wordlist; open(FH,'word.list') or die; while(<FH>) { chomp; undef($wordlist->{lc $_}) if length($_) == $length; } print solo($lword,$rword); #cmpthese( ... ); sub solo { my ( $rdepth,$ldepth,$rex,$lex ) = (0,0,1,1); my $left = { $_[0] => [$_[0]] }; my $right = { $_[1] => [$_[1]] }; while ( 1 ) { # compare the intersection of the leaf nodes my $lc = List::Compare->new( { lists => [ [keys %$left], [keys %$right] ], accelerated => 1, unsorted => 1, } ); my @int = $lc->get_intersection(); if ( @int ) { my $solve = shift @int; pop @{$right->{$solve}}; return join("\n", @{$left->{$solve}}, reverse @{$right->{$ +solve}}) . "\n"; } # pick a side to expand the search in if ( $lex && $rdepth > $ldepth ) { $lex = expandTree($left,++$ldepth); } elsif ( $rex ) { $rex = expandTree($right,++$rdepth); } else { return "No solution."; } } } sub expandTree { my ($tree, $depth) = @_; my ($word, $path, $expanded); while ( ($word, $path) = each %$tree ) { if ( @$path == $depth ) { for my $i (0..(length($word)-1) ) { for my $letter ( 'a'..'z' ) { my $try = $word; substr($try,$i,1) = $letter; next if exists $tree->{$try}; if ( exists $wordlist->{$try} ) { $tree->{$try} = [ @{$tree->{$word}}, $try ]; $expanded = 1; } } } } } return $expanded; }


You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.

Replies are listed 'Best First'.
Re^2: A Better Word Morph Builder
by Ieronim (Friar) on Jun 29, 2006 at 20:32 UTC
    I have benchmarked your method. If i am not mistaken, is's another example of bidirectional search, isn't it?

    Benchmark results:

    Rate find_path solo find_path2 transform find_path 5.91/s -- -62% -75% -97% solo 15.7/s 166% -- -35% -92% find_path2 24.1/s 308% 53% -- -87% transform 188/s 3084% 1097% 681% --
    Note: Your solution seems to be suitable only for basic English charset:
    for my $letter ( 'a'..'z' ) {
    And what about the Unicode? ;)
      Thanks for the benchmark! You're right, it is bi-directional. And you caught me, 'a'..'z' was a precomputed optimization based on the set of characters found in my wordlist. ;) Does 2of12 include additional characters?


      You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
        2of12 includes only [a-z'0-9-]. But as i am Russian, i wanted the algorithm to work with any language ;) And it does, but takes a HUGE amount of time to process my 1Mb russian wordlist :(
Re^2: A Better Word Morph Builder
by Limbic~Region (Chancellor) on Jun 30, 2006 at 18:59 UTC
    Thanks! I also benchmarked your solution. You can see how you did here.

    Cheers - L~R

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558426]
[1nickt]: Yaerox probably best to whip up an SSCCE demonstrating the failure in what you have now and post to SoPW as a question.
[Corion]: Yaerox: That's a somewhat hard problem. Encode solves the conversion part, but for guessing what encoding a file is in, that's the hard part
[Corion]: Yaerox: There is Encode::Guess, but that needs a limited set of inputs, and it also cannot handle multiple single-byte encodings
[Corion]: If you have a BOM, that's a really easy way to recognize UTF-8. Otherwise, you can try to decode a file from UTF-8, and if that works OK and doesn't crash, most likely the file was valid UTF-8
[Corion]: But as "ansi" (Latin-1?) is a single-byte encoding, any file is a valid ANSI file

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (10)
As of 2017-03-28 13:13 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (332 votes). Check out past polls.