Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^2: A Better Word Morph Builder

by Ieronim (Friar)
on Jun 29, 2006 at 18:51 UTC ( #558413=note: print w/replies, xml ) Need Help??

in reply to Re: A Better Word Morph Builder
in thread A Better Word Morph Builder

i modified transform() subroutine acording to your recommendation i moved from dereferecing $list hashref to using it as a reference. The speed near doubled :)

New benchmark results:

Rate find_path find_path2 transform find_path 6.52/s -- -72% -96% find_path2 23.7/s 263% -- -87% transform 183/s 2702% 673% --

Replies are listed 'Best First'.
Re^3: A Better Word Morph Builder
by Limbic~Region (Chancellor) on Jun 30, 2006 at 01:12 UTC
    I give. While I understand your admittedly fast code - it is extremely hard to follow. The cleanest code I could come up with using a bi-directional BFS is as fast as your transform() before following my recommendation. That is to say, it is about half as fast as your current version:
    use constant SRC => 0; use constant TGT => 1; sub find_path2 { my ($src, $tgt, $list) = @_; my (@src_work, @tgt_work, %path); for my $dir (SRC, TGT) { my ($word, $work) = $dir == SRC ? ($src, \@src_work) : ($tgt, +\@tgt_work); $path{$word}[$dir] = -1; for (@{$list->{$word}}) { push @$work, $_; $path{$_}[$dir] = $word; } } while (1) { for my $dir (SRC, TGT) { my @next; my $work = $dir == SRC ? \@src_work : \@tgt_work; for my $word (@$work) { return build_path(\%path, $word) if $path{$word}[abs($ +dir - 1)]; for (@{$list->{$word}}) { next if $path{$_}[$dir]; push @next, $_; $path{$_}[$dir] = $word; } } @$work = @next; } return 'Path not found' if ! @src_work && ! @tgt_work; } } sub build_path { my ($tree, $node) = @_; my $path = "-$node"; for my $dir (SRC, TGT) { my $word = $tree->{$node}[$dir]; while ($word ne '-1') { $path = $dir == SRC ? "-$word$path" : "$path-$word"; $word = $tree->{$word}[$dir]; } } return substr($path, 1); }
    Update (2006-07-03): Simply moving return build_path(\%path, $word) if $path{$word}[abs($dir - 1)]; to the inner most for loop and s/word/_/ brings the performance much closer. See the update in this node for an explanation why.

    Cheers - L~R

Re^3: A Better Word Morph Builder
by Limbic~Region (Chancellor) on Jun 30, 2006 at 18:48 UTC
    I have finished benchmarking the routines. I didn't include any from this thread or this one but you are welcome to. I did not penalize any routine for processing the dictionary. Additionally, if there were a number of small variations for a routine, I only included the fastest.
    Rate limbic__2 solo____1 limbic__3 limbic__4 ieronim_1 lim +bic__1 limbic__2 6.48/s -- -45% -62% -90% -95% + -96% solo____1 11.8/s 82% -- -31% -82% -92% + -92% limbic__3 17.3/s 166% 46% -- -74% -88% + -89% limbic__4 67.5/s 941% 471% 291% -- -53% + -55% ieronim_1 143/s 2104% 1109% 728% 112% -- + -5% limbic__1 150/s 2218% 1171% 771% 123% 5% + --
    Please note that the winner is just my re-write of Ieronim's code with a few extra bells and whistles. Since the benchmark is extremely long, I have put it in spoiler tags as well as readmore tags:
    Update (2006-07-03): These benchmarks do not reflect the realization I made concerning when to test if a path connection has been made that I noted elsewhere in this thread.

    Cheers - L~R

      The results of the same benchmark (!!) on my machine are here:
      Rate limbic__2 solo____1 limbic__3 limbic__4 limbic__1 ier +onim_1 limbic__2 5.64/s -- -63% -77% -94% -97% + -97% solo____1 15.3/s 172% -- -37% -83% -92% + -92% limbic__3 24.2/s 329% 58% -- -74% -87% + -88% limbic__4 91.7/s 1528% 499% 279% -- -52% + -54% limbic__1 190/s 3267% 1140% 684% 107% -- + -5% ieronim_1 200/s 3449% 1207% 727% 118% 5% + --
      I can't explain the difference :))
        I can't explain the difference :))

        Are the inputs for the benchmarks the same words in both cases?


        You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
Re^3: A Better Word Morph Builder
by Limbic~Region (Chancellor) on Jun 30, 2006 at 17:32 UTC
    I have taken the best parts of all the versions and re-written them in what I believe to be relatively clean code. I have not yet benchmarked this but I expect it to be very close to your current best. Here is a list of features:
    • Command line argument parsing
    • Ability to handle non-dictionary words (configurable)
    • Ability to handle absense of Text::LevenshteinXS (automatic)
    • Ability to use different compiled databases (configurable)
    • Ability to create new compiled databases (configurable)
    • Speed and non-duplicated code

    Cheers - L~R

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://558413]
[ambrus]: erix: that one actually sucks. these days people should get rid of the old notion that TeX is the only thing you can use for decent mathematics writing, because MS Office and LibreOffice have reached the
[ambrus]: level where people can more easily write as good mathematical papers in them as the people who write bad LaTeX papers usually write.
[ambrus]: Yes, for like the first twenty of its years, TeX was basically the only system that allowed you to write decent maths papers, and C++ and PHP were programming languages that sucked, etc. But times change and people have to accept that.
Discipulus bad people + good tool < normal people + decent tool

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (12)
As of 2017-09-26 11:00 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (293 votes). Check out past polls.