Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

How can I get the longest and all other common substrings from a set of strings?

by supriyoch_2008 (Scribe)
on Sep 21, 2012 at 07:46 UTC ( #994837=perlquestion: print w/ replies, xml ) Need Help??
supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,

I am a beginner in perl. Perl is the first computer language I am learning. I am interested to get the longest as well as all the common substrings from a set of sequences. I have searched almost all the threads to get the script for this purpose. I have come across a script c.pl given below, which can produce the longest common substring between two sequences but not from all the sequences. I have also read that LCSS in CPAN can do this task and I have gone through the subroutine but could not make out how to use it for getting the desird result.

Moreover, I am more interested in using subroutine in script (because I understand it easily) rather than using modules or algorithms at the beginning of script because I donot know in which directory of perl (within C drive) I should put the modules or algorithms to get to work. I have also tried several scripts but in vain. One of the scripts that I have tried is try11.pl (given below) but the cmd asks for CSS subroutine. May I request perlmonks to go through the script c.pl and to suggest me for changes in the code to get the desired results?

Where shall I get the detailed simple text for beginners (avoiding technical terms of computer science) for using the modules and algorithms of CPAN in perl script (with examples)?

The c.pl goes like:

#!/usr/bin/perl ## LONGEST COMMON SUBSTRINGs (LCS): use warnings; use strict; sub lc_substr { my ($str1, $str2) = @_; my $l_length = 0; # length of longest common substring my $len1 = length $str1; my $len2 = length $str2; my @char1 = (undef, split(//, $str1)); # $str1 as array of chars, in +dexed from 1 my @char2 = (undef, split(//, $str2)); # $str2 as array of chars, in +dexed from 1 my @lc_suffix; # "longest common suffix" table my @substrings; # list of common substrings of length $l_length for my $n1 ( 1 .. $len1 ) { for my $n2 ( 1 .. $len2 ) { if ($char1[$n1] eq $char2[$n2]) { # We have found a matching character. Is this the first matchi +ng character, or a # continuation of previous matching characters? If the former, + then the length of # the previous matching portion is undefined; set to zero. $lc_suffix[$n1-1][$n2-1] ||= 0; # In either case, declare the match to be one character longer + than the match of # characters preceding this character. $lc_suffix[$n1][$n2] = $lc_suffix[$n1-1][$n2-1] + 1; # If the resulting substring is longer than our previously rec +orded max length ... if ($lc_suffix[$n1][$n2] > $l_length) { # ... we record its length as our new max length ... $l_length = $lc_suffix[$n1][$n2]; # ... and clear our result list of shorter substrings. @substrings = (); } # If this substring is equal to our longest ... if ($lc_suffix[$n1][$n2] == $l_length) { # ... add it to our list of solutions. push @substrings, substr($str1, ($n1-$l_length), $l_length); } } } } return @substrings; } my @result1=lc_substr qw(ABABC BABCA ABCBA); my $result1=join('',@result1); my $leng1=length($result1); print"\n The longest common substring :\n"; print "\n@result1: Length=$leng1 letters\n"; print"\n Other common substrings in order of decreasing lengths are:\n +"; my @result2="?";

I have got the following results:

The longest common substring : BABC: Length=4 letters Other common substrings in order of decreasing lengths are:??

The expected results should look like:

The longest common substring : ABC; Length=3 Other common substrings in order of decreasing lengths are: AB: Length=2 BC: Length=2 BA: Length=2

I have tried the script try11.pl given below. But the cmd asks for CSS subroutine which I could not find in cpan. Here goes the try11.pl

#!/usr/bin/perl ## LONGEST COMMON SUBSTRINGS (sorted) from a set of given sequences: use strict; use warnings; sub LCS { # Line 5 my ($ctx, $a, $b) = @_; my ($amin, $amax, $bmin, $bmax) = (0, $#$a, 0, $#$b); while ($amin <= $amax and $bmin <= $bmax and $a->[$amin] eq $b->[$ +bmin]) { $amin++; $bmin++; } while ($amin <= $amax and $bmin <= $bmax and $a->[$amax] eq $b->[$ +bmax]) { $amax--; $bmax--; } # Line 15 my $h = $ctx->line_map(@$b[$bmin..$bmax]); # line numbers are off +by $bmin return $amin + _core_loop($ctx, $a, $amin, $amax, $h) + ($#$a - $a +max) unless wantarray; my @lcs = _core_loop($ctx,$a,$amin,$amax,$h); if ($bmin > 0) { # Line 20 $_->[1] += $bmin for @lcs; # correct line numbers } map([$_ => $_], 0 .. ($amin-1)), @lcs, map([$_ => ++$bmax], ($amax+1) .. $#$a); } sub a { my $match = CSS(@_); # line 28 if ( ref $_[0] eq 'ARRAY' ) { @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,scalar(@$_ +)]}@$match } else { # Line 32 @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,length($_) +]}@$match } return $match; } ## Data Input & Results: # Line 37 print"\nThe longest common substrings in decreasing order of lengths:\ +n"; my $result1=a qw(ABABC BABCA ABCBA); my $leng1=$result1; # Line 40 print"\n$result1; Length=$leng1\n\n"; exit;

Results of cmd for try11.pl:

Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\x>cd desktop C:\Users\x\Desktop>try1.pl The longest common substrings in decreasing order of lengths: Undefined subroutine &main::CSS called at C:\Users\DR-SUPRIYO\Desktop\ +try1.pl line 28. C:\Users\x\Desktop>

Comment on How can I get the longest and all other common substrings from a set of strings?
Select or Download Code
Re: How can I get the longest and all other common substrings from a set of strings?
by prashantktyagi (Scribe) on Sep 21, 2012 at 08:17 UTC
    I would suggest you to learn perl first then go for writing programs.
    No one will here read your 100 lines of code and fix it.Please go through http://learn.perl.org first.
Re: How can I get the longest and all other common substrings from a set of strings?
by marto (Chancellor) on Sep 21, 2012 at 09:02 UTC

    Re read this and tell me what you don't understand or agree with. If you download code from somewhere, run it without attempting to understand it and get error messages, at least take the time to investigate these errors yourself. Stop copying modules manually (Where am I going wrong in the code for creating the first GUI using hello.pl?) and learn to install them properly.

    If you have no interest in learning to use the tools you have chosen (seemingly perl) to accomplish the task, select a different set of tools or change career.

Re: How can I get the longest and all other common substrings from a set of strings?
by Anonymous Monk on Sep 21, 2012 at 09:15 UTC

    I am a beginner in perl.

    This has been going on, post after post, for nine months. the honeymoon is over. No more walls of broken code. No more pretending to want to learn. We know effort when we see it, and we know when we're being used as a crutch.

Re: How can I get the longest and all other common substrings from a set of strings?
by Anonymous Monk on Sep 21, 2012 at 09:25 UTC
    Hey Kid,

    The people are here like a bullshit, keeps on talking bullshit rather than to give a solution..............a proper program is always build up of thousands lines of code, so this is just hundred.

    Kid, do one thing post only that part which you think is a trouble, to find out a trouble try to debug your program first using "perl -d yourscriptname" , do a thorough debug along with print and data dumper too if required.

    Don't mind the nuisance talks, only focus on your improvement. Regards, JACK

      Jack,

      Thanks for your constructive suggestions. I shall focus on improvement.

      With Regards

Re: How can I get the longest and all other common substrings from a set of strings?
by thundergnat (Deacon) on Sep 21, 2012 at 18:28 UTC

    Here you go.

    @s = qw(ABABC BABCA ABCBA); sub l{length@_[0]};for$i(0..$#s){$l=l$s[$i];for$s(0..$l-1){for$o(1..$l +-$s) {$t=substr$s[$i],$s,$o;if(!$i||exists$c{$t}){$c{$t}|=1<<$i}}}};print j +oin$/, 'Common subsequences:',@s,'-'x30,sort{l($b)<=>l$a}grep{$c{$_}==(1<<@s) +-1}keys%c;
    Common subsequences:
    ABABC
    BABCA
    ABCBA
    ------------------------------
    ABC
    BC
    BA
    AB
    C
    A
    B
    
    

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://994837]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2014-09-22 07:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (182 votes), past polls