Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I wanted to cut cycles on one of my programs that needs string approximation and the Text::LevenshteinXS module wouldn't install (no 5.8 on the server) so I wrote my own copy of the levenshtein algorithm. benchmarking against Text::Levenshtein showed 15-35% speed gain. I was thinking of uploading this to cpan, since i've seen many something/Fastsomething modules up there. The biggest difference in the algorithm itself (not the short circuiting) is the min function...why people want to loop over an array when they KNOW there are only going to be 3 paramaters really confounds me.
Think this fast module is worthy of cpan?
package Text::FastLevenshtein; use strict; use Exporter; use vars qw ($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); $VERSION = '0.02'; @ISA = qw(Exporter); @EXPORT = (); @EXPORT_OK = qw(&distance); %EXPORT_TAGS = (); sub _min { my $min = $_[0]; $min = $_[1] if $_[1] < $min; $min = $_[2] if $_[2] < $min; return $min; } sub distance($$) { my $word1 = shift; my $word2 = shift; return 0 if $word1 eq $word2; my @d; my $len1 = length $word1; my $len2 = length $word2; $d[0][0] = 0; for (1 .. $len1) { $d[$_][0] = $_; return $_ if $_!=$len1 && substr($word1,$_) eq substr( +$word2,$_); } for (1 .. $len2) { $d[0][$_] = $_; return $_ if $_!=$len2 && substr($word1,$_) eq substr( +$word2,$_); } for my $i (1 .. $len1) { my $w1 = substr($word1,$i-1,1); for (1 .. $len2) { $d[$i][$_] = _min($d[$i-1][$_]+1, $d[$i][$_-1] ++1, $d[$i-1][$_-1]+($w1 eq substr($word2,$_-1,1) ? 0 : 1)); } } return $d[$len1][$len2]; } 1; __END__

In reply to improved levenshtein by bageler

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    [choroba]: Good morning!

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (6)
    As of 2018-05-21 08:00 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      Notices?