Re: finding longest common substring

Building on japhy's regex, this seems to work and it's fairly simple, though I haven't tested it's efficiency. It returns undef if there is no common substring.

Caveat: The strings mustn't contain nulls.

#! perl -slw
use strict;

sub lcs{
    my $strings = join "\0", @_;
    my $lcs;
    for my $n ( 1 .. length $strings ) {
        my $re = "(.{$n})" . '.*\0.*\1' x ( @_ - 1 );
        last unless $strings =~ $re;
        $lcs = $1
    }
    return $lcs;
}

my @a = <DATA>;
chomp @a;

print "lcs: ", lcs( @a );

__DATA__
The quick brown fox jump over the lazy dog
The quick brown fox jumps over the lazy
jumps over the lazy dog The quick brown fox
quick brown fox jumps over the lazy dog
[download]

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!

Replies are listed 'Best First'.
Re: Re: finding longest common substring by sauoq (Abbot) on Nov 20, 2003 at 00:44 UTC
While significantly faster than the OP's, I believe it suffers from some of the same scalability issues. That's one thing I hate about regular expressions: determining the complexity of an algorithm built on them tends to be very difficult. Anyway, that also suffers from another problem. Try it with `my @a = ('a' . 'X' x 32768, 'a' . 'O' x 32768);` [download] and you'll get an error: `Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/ +(.{ <-- HERE 32767}).\0.\1/` [download] -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]
Re: Re: Re: finding longest common substring by revdiablo (Prior) on Nov 20, 2003 at 03:30 UTC
While significantly faster than the OP's Actually, simply using `index` instead of `m//` in the grep makes my algorithm a bit faster than BrowserUk's regex. Granted, there's still a scalability problem here, but with the small tweak suggested by CombatSquirrel, it's much faster than the original, and works fine on data that is representative of what I'm actually using this for. Here is the benchmark code I used: Read more... (2 kB) And here are the results I got: Read more... (842 Bytes)	[reply] [d/l] [select]
Re: Re: Re: finding longest common substring by BrowserUk (Patriarch) on Nov 20, 2003 at 01:13 UTC
I guess a length limit of 32k for the longest common substring could be a concern for some applications -- but nothing I regularly have to deal with. Do you have an alternative? Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray! Wanted!	[reply]


Think about Loose Coupling
	PerlMonks