|
rkg has asked for the
wisdom of the Perl Monks concerning the following question:
Hi --
I am looking for a module to solve (or appoximately solve) the longest common substring problem.
I found two modules on CPAN:
String-LCSS-0.10 and
String-Ediff-0.01.
The first didn't download ("File not found") from CPAN; the second required a C compiler and I don't have one available (Activestate perl).
Any have any suggestions?
Thanks
R
Re: Longest Common Substring by nothingmuch (Priest) on Apr 09, 2003 at 15:25 UTC |
Algorithm::Diff is probably of use. It's pure perl, and nearly every other module i found relies on it.
-nuffin zz zZ Z Z #!perl | [reply] |
|
| [reply] |
Re: Longest Common Substring by glwtta (Hermit) on Apr 09, 2003 at 15:36 UTC |
String::LCSS certainly seems to be what you want. And I don't seem to be having any trouble getting it from CPAN. | [reply] |
Re: Longest Common Substring by BrowserUk (Saint) on Apr 09, 2003 at 16:24 UTC |
Probably not the most efficient implementation (and maybe incomplete), but if you can't get the modules to work, you could wrap this up as a subroutine.
#! perl -slw
use strict;
use Data::Dumper;
my $needle = 'the lazy cat';
my $haystack = 'the quick brown fox jumps over the lazy dog';
my @matches;
for my $start (0..length $needle) {
for my $len ($start+1 .. length $needle) {
my $substr = substr($needle, $start, $len);
push @matches, $haystack =~ m[($substr)]g;
}
}
my ($len, $longest) = 0;
length > $len and ($longest, $len) = ($_, length) for @matches;
print "The longest common substring between\n", $needle, "\nand\n", $h
+aystack, "\nis '$longest'";
__END__
C:\test>249239.pl
The longest common substring between
the lazy cat
and
the quick brown fox jumps over the lazy dog
is 'the lazy '
C:\test>
Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.
| [reply] [d/l] |
|
BrowserUK's solution doesn't quite work I tried the following and did not get back what was expected ( " the quick brown fox "):my $needle = 'i pushed the lazy dog into a creek, the quick brown fox
+told me to';
my $haystack = 'the quick brown fox jumps over the lazy dog';
and got back 'the lazy dog' (it works if you reverse the $needle and the $haystack.
But to his credit it seems to work better than String-LCSS-0.10, and he did mention that it might be incomplete (run it both ways, get the longest string and print that, I believe should give you the most right answer, but my tests might also be incomplete). Where which running the following code: I would use Algorithm-Diff using this part of the documentation as a starting point. Note: The Author of String-LCSS-0.10 has been notified. -enlil
| [reply] [d/l] [select] |
Re: Longest Common Substring by BrowserUk (Saint) on Apr 09, 2003 at 20:13 UTC |
Enlil's right, my code was a bit slovenly. I believe this is both more correct and more efficient, but I can't guarentee it doesn't fail on some edge case or another.
#! perl -slw
use strict;
sub lcss (\$\$) {
my ($needle, $haystack) = @_;
($needle, $haystack) = ($haystack, $needle)
if length $$needle > length $$haystack;
my ($longest_c, $longest) = 0;
for my $start (0..length $$needle) {
for my $len ( reverse $start+1 .. length $$needle) {
my $substr = substr($$needle, $start, $len);
length $1 > $longest_c and ($longest_c, $longest) = (lengt
+h $1, $1)
while $$haystack =~ m[($substr)]g;
}
}
return $longest;
}
my $needle = 'the lazy cat';
my $haystack = 'the quick brown fox jumps over the lazy dog';
print "The longest common substring between\n"
, $needle, "\nand\n"
, $haystack, "\nis '"
, lcss($needle, $haystack)
, "'";
$needle = 'i pushed the lazy dog into a creek, the quick brown fox tol
+d me to';
$haystack = 'the quick brown fox jumps over the lazy dog';
print "The longest common substring between\n"
, $needle, "\nand\n"
, $haystack, "\nis '"
, lcss($needle, $haystack)
, "'";
__END__
C:\test>249239.pl
The longest common substring between
the lazy cat
and
the quick brown fox jumps over the lazy dog
is 'the lazy '
The longest common substring between
i pushed the lazy dog into a creek, the quick brown fox told me to
and
the quick brown fox jumps over the lazy dog
is 'the quick brown fox '
Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.
| [reply] [d/l] |
Re: Longest Common Substring by PodMaster (Abbot) on Apr 10, 2003 at 00:53 UTC |
I don't know if it was you , but somebody has requested String-Ediff and I have put it up on my repository.
If it was more than a single function it would've been impossible for me to do since the guy who wrote it used swig, and well, swig on win32 hasn't been working out for me. I ended up writing an XS port (pretty easily I might add), and the source can be found at http://crazyinsomniac.perlmonk.org/perl/files/String-Ediff-0.01.tar.gz.
MJD says you
can't just make shit up and expect the computer to know what you mean, retardo!
I run a Win32 PPM
repository for perl 5.6x+5.8x. I take requests.
** The Third rule of perl club is a statement of fact: pod is sexy.
|
| [reply] |
LCS != LCSS by rkg (Hermit) on Apr 11, 2003 at 12:38 UTC |
Hi, all. Thanks for the help. After spending some time with Algorithm::Diff, I've realized that Alg::Diff's LCS(Longest Common SubSequence) is not the same as String-LCSS (Longest Common Substring). Just wanted to share that (perhaps obvious) observation. LCSS does what I need, and is working fine for me...thanks!
R | [reply] |
Back to
Seekers of Perl Wisdom
|
|