Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: A Better Word Morph Builder

by Ieronim (Friar)
on Jun 29, 2006 at 18:51 UTC ( #558413=note: print w/ replies, xml ) Need Help??


in reply to Re: A Better Word Morph Builder
in thread A Better Word Morph Builder

i modified transform() subroutine acording to your recommendation i moved from dereferecing $list hashref to using it as a reference. The speed near doubled :)

New benchmark results:

Rate find_path find_path2 transform find_path 6.52/s -- -72% -96% find_path2 23.7/s 263% -- -87% transform 183/s 2702% 673% --


Comment on Re^2: A Better Word Morph Builder
Download Code
Re^3: A Better Word Morph Builder
by Limbic~Region (Chancellor) on Jun 30, 2006 at 01:12 UTC
    Ieronim,
    I give. While I understand your admittedly fast code - it is extremely hard to follow. The cleanest code I could come up with using a bi-directional BFS is as fast as your transform() before following my recommendation. That is to say, it is about half as fast as your current version:
    use constant SRC => 0; use constant TGT => 1; sub find_path2 { my ($src, $tgt, $list) = @_; my (@src_work, @tgt_work, %path); for my $dir (SRC, TGT) { my ($word, $work) = $dir == SRC ? ($src, \@src_work) : ($tgt, +\@tgt_work); $path{$word}[$dir] = -1; for (@{$list->{$word}}) { push @$work, $_; $path{$_}[$dir] = $word; } } while (1) { for my $dir (SRC, TGT) { my @next; my $work = $dir == SRC ? \@src_work : \@tgt_work; for my $word (@$work) { return build_path(\%path, $word) if $path{$word}[abs($ +dir - 1)]; for (@{$list->{$word}}) { next if $path{$_}[$dir]; push @next, $_; $path{$_}[$dir] = $word; } } @$work = @next; } return 'Path not found' if ! @src_work && ! @tgt_work; } } sub build_path { my ($tree, $node) = @_; my $path = "-$node"; for my $dir (SRC, TGT) { my $word = $tree->{$node}[$dir]; while ($word ne '-1') { $path = $dir == SRC ? "-$word$path" : "$path-$word"; $word = $tree->{$word}[$dir]; } } return substr($path, 1); }
    Update (2006-07-03): Simply moving return build_path(\%path, $word) if $path{$word}[abs($dir - 1)]; to the inner most for loop and s/word/_/ brings the performance much closer. See the update in this node for an explanation why.

    Cheers - L~R

Re^3: A Better Word Morph Builder
by Limbic~Region (Chancellor) on Jun 30, 2006 at 17:32 UTC
    Ieronim,
    I have taken the best parts of all the versions and re-written them in what I believe to be relatively clean code. I have not yet benchmarked this but I expect it to be very close to your current best. Here is a list of features:
    • Command line argument parsing
    • Ability to handle non-dictionary words (configurable)
    • Ability to handle absense of Text::LevenshteinXS (automatic)
    • Ability to use different compiled databases (configurable)
    • Ability to create new compiled databases (configurable)
    • Speed and non-duplicated code

    Cheers - L~R

Re^3: A Better Word Morph Builder
by Limbic~Region (Chancellor) on Jun 30, 2006 at 18:48 UTC
    Ieronim,
    I have finished benchmarking the routines. I didn't include any from this thread or this one but you are welcome to. I did not penalize any routine for processing the dictionary. Additionally, if there were a number of small variations for a routine, I only included the fastest.
    Rate limbic__2 solo____1 limbic__3 limbic__4 ieronim_1 lim +bic__1 limbic__2 6.48/s -- -45% -62% -90% -95% + -96% solo____1 11.8/s 82% -- -31% -82% -92% + -92% limbic__3 17.3/s 166% 46% -- -74% -88% + -89% limbic__4 67.5/s 941% 471% 291% -- -53% + -55% ieronim_1 143/s 2104% 1109% 728% 112% -- + -5% limbic__1 150/s 2218% 1171% 771% 123% 5% + --
    Please note that the winner is just my re-write of Ieronim's code with a few extra bells and whistles. Since the benchmark is extremely long, I have put it in spoiler tags as well as readmore tags:
    Update (2006-07-03): These benchmarks do not reflect the realization I made concerning when to test if a path connection has been made that I noted elsewhere in this thread.

    Cheers - L~R

      The results of the same benchmark (!!) on my machine are here:
      Rate limbic__2 solo____1 limbic__3 limbic__4 limbic__1 ier +onim_1 limbic__2 5.64/s -- -63% -77% -94% -97% + -97% solo____1 15.3/s 172% -- -37% -83% -92% + -92% limbic__3 24.2/s 329% 58% -- -74% -87% + -88% limbic__4 91.7/s 1528% 499% 279% -- -52% + -54% limbic__1 190/s 3267% 1140% 684% 107% -- + -5% ieronim_1 200/s 3449% 1207% 727% 118% 5% + --
      I can't explain the difference :))
        I can't explain the difference :))

        Are the inputs for the benchmarks the same words in both cases?


        --Solo

        --
        You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558413]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (12)
As of 2015-07-03 11:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (51 votes), past polls