Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

measuring the similarity between strings

by mayaTheCat (Scribe)
on Sep 05, 2003 at 10:10 UTC ( #289152=snippet: print w/replies, xml ) Need Help??
Description: hi monks, here comes another implementation of the Levenshtein Distance; a distance metric which measures the similarities of strings.

Info and some other implementations can be found at Levenshtein distance: calculating similarity of strings.

At the moment I need to compare two sets of strings where each set consists of approximately 2000 strings, which means approximataley 3-4 million measurements. Thus I wrote this code, in which I tried to minimize the memory usage, to make it more efficient.

sub ldist {
    my @s = split //, shift;
    my @t = split //, shift;

    return scalar @t if scalar @s == 0;
    return scalar @s if scalar @t == 0;

    my (@prevColumn, @currColumn);

    @prevColumn = 0..scalar(@t);

    for my $s (0..$#s) {
        @currColumn = ( $s + 1 );
        for my $t (0..$#t) {
            push @currColumn, min(
                $currColumn[$t] + 1
                , $prevColumn[$t+1] + 1
                , $prevColumn[$t] + ($s[$s] eq $t[$t] ? 0 : 1)
        @prevColumn = @currColumn;

    pop @currColumn

sub min {
    my $min = shift;

    for (@_) {
        $min = $_ if $_ < $min;


Replies are listed 'Best First'.
Re: measuring the similarity between strings
by dree (Monsignor) on Sep 05, 2003 at 12:12 UTC
    To make more efficient you can also try Text::LevenshteinXS that is 1000% faster than plain Text::Levenshtein.
Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://289152]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2019-07-22 09:34 GMT
Find Nodes?
    Voting Booth?
    If you were the first to set foot on the Moon, what would be your epigram?

    Results (12 votes). Check out past polls.