laziness, impatience, and hubris PerlMonks

### Re: Pattern Searching

by marquezc329 (Scribe)
 on Nov 10, 2012 at 07:31 UTC ( #1003233=note: print w/replies, xml ) Need Help??

Hello and welcome, aseee.

You posted a rather large amount of code for a broad question. If you can show expected output and tell where exactly you are having trouble it will be much easier to provide a helpful answer. Also, I'd suggest having a look through perlintro and perlstyle. Cleaning up your code a bit will go a long way towards helping us help you. Not to mention, a strong understanding of the basics will better equip you to answer your own questions in the future.

**UPDATE**

Since you didn't exactly point out where your problem lies, I'm going to guess it is most likely in the maze of single char variable names and unformatted loops. I would suggest cleaning up your algorithm subs. The added clarity in conjunction with a slow review may yield the solution to your problem. Also, the Knuth-Morris-Pratt algorithm is covered on page 370 in Mastering Algorithms with Perl by Jarkko Hietaniemi, John Macdonald, and Jon Orwant. There is a link to view this material online in this node Knuth-Morris-Pratt Vs. Perl. I suggest you contemplate rewriting your code using this material as a guide, and implore you to research more thoroughly before asking for help. Finding your own answers is infinitely more rewarding ( I myself learned what the KMP algorithm is and how it works by researching Your question tonight ;).

I took a stab at cleaning up some of your code. Some notes to keep in mind:

Incremented for loops i.e. for (\$i = 0; \$i <10; \$i++) { ... } are easily changed to for (0..9) { ... }

White space can be key in clarity. Try to keep a consistent indentation theme. i.e:

```for (0..9) {
print "line1\n";
print "line2\n";
if (\$cond =~ /pattern/i) {
do something;
}
}

Here is a revised (UNTESTED) sample of your opening for loop.

```
my @tt=(\$T_one,\$Ta,\$rr);

foreach my \$string (@tt) {
@loc=();
@text=();

my \$strLength = length(\$string);
my @stringArr = split //, \$string;

print "<\br>";
print "Length of Gene Sequence Array:    \$strLength\n";

foreach my \$pattern (@patterns){
print knuth_morris_pratt(\$string, \$pattern), "\n";
}

@loc = sort {\$a <=> \$b} @loc;

print "</br>";
print "@loc";
print "</br>";

my \$i=0;
foreach my \$k (0 .. (\$strLength-1)) {
print \$i;
if (\$k == \$loc[\$i]){
print "<span style=background-color:red;>\$text[\$k]</span>"
+;
\$i++;
} else {
print \$text[\$k];
}
}

print "</br>";

}

Replies are listed 'Best First'.
Re^2: Pattern Searching
by aseee (Novice) on Nov 10, 2012 at 10:09 UTC

well thanks for your reply. In this code the KMP algorithm is implemented. A pattern and the string are passed to the KMP function named as knuth_morris_pratt.It returns the location of patterns, store in @loc array. Then the string is displayed with a colored background where the patterns occur. Code for this is

```
my \$ii=1;                        for(my \$k=1;\$k<\$lt;\$k++){
if (\$k == \$loc[\$ii]){
print "<span style=background-color:red;>" .\$text[\$k] . "</span>";
\$ii++; }
}

The process is repeated for all strings stored in the array. This code works well in first iteration of for loop but it does not show the same result in the 2nd iteration for the same string, stored at the next index of array. The output of this code is

```<r>length of gene sequence array 66
0 1 2 3 4 5 14 15 16 17 18 32 33 34 35 36 37 48 49 50 51 52 60 61 62 6
+3 64 65
GAATTCCCWGGGAATTCCCWGGGAATTC
<r>length of gene sequence array 66
0 1 2 3 4 5 14 15 16 17 18 32 33 34 35 36 37 48 49 50 51 52 60 61 62 6
+3 64 65
G
<r>length of gene sequence array 66

0 1 2 3 4 5 14 15 16 17 18 32 33 34 35 36 37 48 49 50 51 52 60 61 62 6
+3 64 65
G

I want

< r>length of gene sequence array 66
0 1 2 3 4 5 14 15 16 17 18 32 33 34 35 36 37 48 49 50 51 52 60 61 62 6
+3 64 65
GAATTCCCWGGGAATTCCCWGGGAATTC
< r>length of gene sequence array 66
0 1 2 3 4 5 14 15 16 17 18 32 33 34 35 36 37 48 49 50 51 52 60 61 62 6
+3 64 65
GAATTCCCWGGGAATTCCCWGGGAATTC
< r>length of gene sequence array 66
0 1 2 3 4 5 14 15 16 17 18 32 33 34 35 36 37 48 49 50 51 52 60 61 62 6
+3 64 65
GAATTCCCWGGGAATTCCCWGGGAATTC

The numbers shows the position of patterns that is same for the same three strings but the patterns are not displayed for the last two strings. The problem is in displaying the patterns in 2nd and 3rd iteration of main for loop.

Re^2: Pattern Searching
by space_monk (Chaplain) on Nov 10, 2012 at 10:09 UTC

Just a note to congratulate you for putting the research into your answer. I upvoted your answer but thought just doing that wasn't enough given how you put the effort into giving a comprehensive reply

A Monk aims to give answers to those who have none, and to learn from those who know more.
Re^2: Pattern Searching
by aseee (Novice) on Nov 10, 2012 at 10:47 UTC

I appreciate your efforts in solving the problem. your code have the same problem i.e. works well for the 1st iteration, foreach loop of \$string then it shows only first letter of patterns for rest of strings.

I figured as much. The snippet wasn't intended to debug, merely to give an example rewrite implementing some of the stylistic points I mentioned.

I'd suggest taking a look at perldebtut. Using the Perl debugger to step through your code may help you visualize the flow of your program through each control structure, and figure out where it may be exiting prematurely on the 2nd and 3rd iteration.

Create A New User
Node Status?
node history
Node Type: note [id://1003233]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2018-04-23 18:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (85 votes). Check out past polls.

Notices?