Hi PerlMonks,
I am a beginner in perl. Perl is the first computer language I am learning. I am interested to get
the longest as well as all the common
substrings from a set of sequences. I have searched almost all the threads to get
the script for this purpose. I have come across a script c.pl given below,
which can produce the longest common substring between two sequences but not from all
the sequences. I have also read that LCSS in CPAN can do this task and I have gone through the subroutine
but could not make out how to use it for getting the desird result.
Moreover, I am more
interested in using subroutine in script (because I understand it easily) rather than using modules or
algorithms at the beginning of script because I donot know in which directory of perl (within C drive)
I should put the modules or algorithms to get to work. I have also tried several scripts but in vain. One
of the scripts that I
have tried is try11.pl (given below) but the cmd asks for CSS subroutine. May I
request perlmonks to go through the script c.pl and to suggest me for changes in the code to get
the desired results?
Where shall I get the detailed simple text for beginners (avoiding technical terms of computer
science) for using the modules and algorithms of CPAN in perl script (with examples)?
The c.pl goes like:
#!/usr/bin/perl
## LONGEST COMMON SUBSTRINGs (LCS):
use warnings;
use strict;
sub lc_substr {
my ($str1, $str2) = @_;
my $l_length = 0; # length of longest common substring
my $len1 = length $str1;
my $len2 = length $str2;
my @char1 = (undef, split(//, $str1)); # $str1 as array of chars, in
+dexed from 1
my @char2 = (undef, split(//, $str2)); # $str2 as array of chars, in
+dexed from 1
my @lc_suffix; # "longest common suffix" table
my @substrings; # list of common substrings of length $l_length
for my $n1 ( 1 .. $len1 ) {
for my $n2 ( 1 .. $len2 ) {
if ($char1[$n1] eq $char2[$n2]) {
# We have found a matching character. Is this the first matchi
+ng character, or a
# continuation of previous matching characters? If the former,
+ then the length of
# the previous matching portion is undefined; set to zero.
$lc_suffix[$n1-1][$n2-1] ||= 0;
# In either case, declare the match to be one character longer
+ than the match of
# characters preceding this character.
$lc_suffix[$n1][$n2] = $lc_suffix[$n1-1][$n2-1] + 1;
# If the resulting substring is longer than our previously rec
+orded max length ...
if ($lc_suffix[$n1][$n2] > $l_length) {
# ... we record its length as our new max length ...
$l_length = $lc_suffix[$n1][$n2];
# ... and clear our result list of shorter substrings.
@substrings = ();
}
# If this substring is equal to our longest ...
if ($lc_suffix[$n1][$n2] == $l_length) {
# ... add it to our list of solutions.
push @substrings, substr($str1, ($n1-$l_length), $l_length);
}
}
}
}
return @substrings;
}
my @result1=lc_substr qw(ABABC BABCA ABCBA);
my $result1=join('',@result1);
my $leng1=length($result1);
print"\n The longest common substring :\n";
print "\n@result1: Length=$leng1 letters\n";
print"\n Other common substrings in order of decreasing lengths are:\n
+";
my @result2="?";
I have got the following results:
The longest common substring :
BABC: Length=4 letters
Other common substrings in order of decreasing lengths are:??
The expected results should look like:
The longest common substring :
ABC; Length=3
Other common substrings in order of decreasing lengths are:
AB: Length=2
BC: Length=2
BA: Length=2
I have tried the script try11.pl given below. But the cmd asks
for CSS subroutine which I could not find in cpan. Here goes the try11.pl
#!/usr/bin/perl
## LONGEST COMMON SUBSTRINGS (sorted) from a set of given sequences:
use strict;
use warnings;
sub LCS { # Line 5
my ($ctx, $a, $b) = @_;
my ($amin, $amax, $bmin, $bmax) = (0, $#$a, 0, $#$b);
while ($amin <= $amax and $bmin <= $bmax and $a->[$amin] eq $b->[$
+bmin]) {
$amin++;
$bmin++;
}
while ($amin <= $amax and $bmin <= $bmax and $a->[$amax] eq $b->[$
+bmax]) {
$amax--;
$bmax--;
} # Line 15
my $h = $ctx->line_map(@$b[$bmin..$bmax]); # line numbers are off
+by $bmin
return $amin + _core_loop($ctx, $a, $amin, $amax, $h) + ($#$a - $a
+max)
unless wantarray;
my @lcs = _core_loop($ctx,$a,$amin,$amax,$h);
if ($bmin > 0) { # Line 20
$_->[1] += $bmin for @lcs; # correct line numbers
}
map([$_ => $_], 0 .. ($amin-1)),
@lcs,
map([$_ => ++$bmax], ($amax+1) .. $#$a);
}
sub a {
my $match = CSS(@_); # line 28
if ( ref $_[0] eq 'ARRAY' ) {
@$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,scalar(@$_
+)]}@$match
}
else { # Line 32
@$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,length($_)
+]}@$match
}
return $match;
}
## Data Input & Results: # Line 37
print"\nThe longest common substrings in decreasing order of lengths:\
+n";
my $result1=a qw(ABABC BABCA ABCBA);
my $leng1=$result1; # Line 40
print"\n$result1; Length=$leng1\n\n";
exit;
Results of cmd for try11.pl:
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\x>cd desktop
C:\Users\x\Desktop>try1.pl
The longest common substrings in decreasing order of lengths:
Undefined subroutine &main::CSS called at C:\Users\DR-SUPRIYO\Desktop\
+try1.pl line 28.
C:\Users\x\Desktop>