Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Progressive pattern matching

by tfrayner (Curate)
on Oct 15, 2001 at 18:52 UTC ( #118889=note: print w/replies, xml ) Need Help??

in reply to Progressive pattern matching

This problem happens to be of interest to me as well. I think the following code does what you're getting at. It's a little crude, but the best I can do at this point is:
#!/usr/bin/perl use strict; use warnings; my $seq="ASPTFHKLDTPRLAKLJHHDFSDA"; my @pattern=("ST","P","RK","ILVF"); # array of refs to arrays of redundant # residues within the pattern my @patternarray; for (my $e=0;$e<=$#pattern;$e++){ my @elementarray= split (/ */, $pattern[$e]); $patternarray[$e]=\@elementarray; } my $found; my $lastmatchpos; LOOP: until ($found){ # deal with the first residue match as a special case my @resarray=@{$patternarray[0]}; $seq=~ /([@resarray])/gc; die("Sequence does not contain requested motif.\n") unless $1; $found = $1; $lastmatchpos=pos($seq); # all the other residues in the pattern for (my $e=1;$e<=$#patternarray;$e++){ my @resarray=@{$patternarray[$e]}; if ($seq=~ /\G([@resarray])/gc){ $found .= $1; }else{ #reset matching algorithm my $newmatchpos=$lastmatchpos+1; last if ($newmatchpos > length($seq)); pos($newmatchpos); $found=''; next LOOP; } } } print ("$found at $lastmatchpos\n");

This only matches the first occurrance of a motif in a given sequence. It should be possible to extend this to return all the matches with a little work. For use of the m/\G.../gc idiom, see perldoc:perlop.

Hope this helps,

Update: Sorry, I just re-read the original request and one of the nuances escaped me. To get the script to just print out the most it can match after having matched the initial residue, I think you can just change the else clause to:

}else{ next LOOP; }

Update to the update: To deal with the case where the first x residues in the motif don't match the target sequence, I think you should be able to do something like wrapping the LOOP block in another for loop to iterate over motif residues while looking for an initial match.

Hmmm. I'm still not sure I've quite got what your're looking for. At what point do you call a match significant? I.e. do you want target sequences matching only 4 motif residues or more, for example? Or will just a single matching residue do (which I doubt, but which is of course the easiest case)?

Replies are listed 'Best First'.
Re: Re: Progressive pattern matching
by Anonymous Monk on Oct 16, 2001 at 09:12 UTC
    Thanks guys for all the will take some time go through it all...being a bit of a newbie and all
    The answer to your last question tfrayner is that I would like to have a minimum of 3 residues that match...hopefully more of course

    I was avoiding these more biological type details so that they would not confuse the issue for some.

    You basically have it right...imagine the user desiring to take his/her "motif" (lets say 10 a.a.) and searching it against one or more protein sequences. The program must match any part, and as much of, the initial input as possible to the protein sequence(s). All cases where there is any type of a match must be printed out.
    Hope this clarifies more of what I am trying to do. I notice that you are a post am part of a research team at the Clinical Genomics Centre in Toronto, Canada.

    Thanks again for your help, it is greatly appreciated!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://118889]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2020-05-25 14:32 GMT
Find Nodes?
    Voting Booth?
    If programming languages were movie genres, Perl would be:

    Results (146 votes). Check out past polls.