note
Albannach
I make a lot of use of [cpan://Text::Levenshtein] which has an XS version that might be competitive, so here is a quick test. Fair warning: I'm far from a Benchmark guru (comments welcome), and the test set you provided might have a different distribution of types than you actually encounter, so if a particular method takes longer for a particular result, there will be some bias.
<code>
Rate rg0now ikegami levenshtein blazar BrowserUk
rg0now 8801/s -- -57% -82% -87% -87%
ikegami 20414/s 132% -- -57% -69% -70%
levenshtein 47786/s 443% 134% -- -28% -30%
blazar 66258/s 653% 225% 39% -- -4%
BrowserUk 68683/s 680% 236% 44% 4% --
</code>
The code follows. <readmore><code>
use strict;
use warnings;
use Benchmark qw(:all);
use Text::LevenshteinXS qw(distance);
sub compare_rg0now {
return 0 unless length $_[0] == length $_[1];
return 1 if $_[0] eq $_[1];
my $diff = 0;
my @l1 = split //, $_[0];
my @l2 = split //, $_[1];
for(my $i = 0; $i < scalar @l1; $i++){
$diff++ if $l1[$i] ne $l2[$i];
return 0 if $diff > 1;
}
return 1;
}
sub compare_blazar {
(my $xor=$_[0]^$_[1]) =~ tr/\0//d;
length $xor < 2;
}
sub compare_BrowserUk {
return ( length $_[0] == length $_[1] )
&& ( length( $_[0] ) - 1 <= ( $_[0] ^ $_[1] ) =~ tr[\0][\0] )
}
sub compare_ikegami {
return unless length $_[0] == length $_[1];
# Omit if expecting differences almost all the time.
return 1 if $_[0] eq $_[1];
my $limit = 1;
foreach (split('', $_[0]^$_[1])) {
next unless ord;
return 0 unless $limit--;
}
return 1;
}
sub compare_levenshtein {
return ( length $_[0] == length $_[1] ) &&
( distance($_[0], $_[1]) < 2);
}
my @data = (
[ qw( abab abab ) ],
[ qw( abab abaa ) ],
[ qw( abab qrst ) ],
[ qw( abab abba ) ],
[ qw( abab ababa ) ],
);
cmpthese(-3, {
'rg0now' => sub {for(@data) {compare_rg0now(@$_)} },
'blazar' => sub {for(@data) {compare_blazar(@$_)} },
'BrowserUk' => sub {for(@data) {compare_BrowserUk(@$_)} },
'ikegami' => sub {for(@data) {compare_ikegami(@$_)} },
'levenshtein' => sub {for(@data) {compare_levenshtein(@$_)} },
});
</code></readmore>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-33341">
<p>--<br>
I'd like to be able to assign to an luser
</div></div>
480930
480930