### Re: Progressive pattern matching

by Masem (Monsignor)
 on Oct 14, 2001 at 05:41 UTC ( #118726=note: print w/replies, xml ) Need Help??

in reply to Progressive pattern matching

Here's a different approach; find the longest left substring of a string that is in another string.
```my \$test = "asdflkjeroiuasdflkjabcdeawerij2lkjasdf";
my \$string = join '', (a..z);
my \$match = 0;
my \$string = "\$string ";  # add one junk char...
do {
\$string = substr( \$string, 0, -2 );
\$match = \$test =~ \$string;
} while ( !\$match || !\$string );
print \$match ? "success with \$string!\n" : "failure\n";

-----------------------------------------------------
Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
It's not what you know, but knowing how to find it if you don't know that's important

Replies are listed 'Best First'.
Re: Re: Progressive pattern matching
by Anonymous Monk on Oct 14, 2001 at 08:26 UTC
Sorry guys...thanks for your code...I am still trying to see if it is applicable...done some initial testing with it and I think that I might not have explained myself well enough.
In one of my user prompts I ask for a string (it is actually a short protein sequence that they can put in)...lets say they input 20 letters (they represent the proteins sequence). I want to take these 20 letters and initiate a search into the file of their choice...
What I would like to happen is for the program to take the first letter (1/20) and search with it...if this letter is found in their file, the program will add onto the first letter, the second letter (2/20), and now search with both 1&2/20. If these two letters are found in the file in succession then the third letter will be added to the search string and so on. What may happen in reality is that it won't be until letter 8/20 before there is an initial match...letters 9, 10, 11, 12 when added one at a time will eventually make a search string of 8,9,10,11,12/20. Letter 13 will not match and then the program must end.
Here is my latest modification...
```while (<MOTIF>) {@motif = <MOTIF>};

for (\$i=0;\$i<=length(\$blocks);\$i++) {
\$pattern = substr(\$blocks, 0, \$i);
if (/\$pattern/)
{push(@array, \$&);
not sure...
dr_jgbn

To clarify what I think that you want, let me construct some examples :

Input string :
GATTACA
File :
ATTACGATTACAAA
GATT
ZZGATTZZ
asdghckasdlkj
TTACA
Output :

```On line 1 :
GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA,
GATT,ATTA,TTA,TAC,GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A
On line 2 :
GATT,GAT,ATT,TT,GA,AT,T,G,A
On line 3 :
GATTA,GATT,ATTA,TTA,GAT,ATT,TT,TA,GA,AT,T,G,A
On line 4 :
GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA,
GATT,ATTA,TTA,TAC,
GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A

To achieve this, you want to find the longest substring of the input string that is found on a line of the file, for the various substrings that match until the end of the last character of the search string. To show you a first approach which is surely suboptimal, look at the following code which tries a brute force approach :

```use strict;

my \$searchString = "GATTACA";
my %subStrings = {};
my @subStrings = ();

sub populate {
# Fills the hash subStrings with all "allowed" substrings
# of the argument. Duplicates are avoided by
# filling a hash instead of an array.
my (\$string) = @_;

return if \$string eq "";

my \$line = "";

foreach (split "", \$string) {
\$line .= \$_;
\$subStrings{\$line} = "1";
};

populate( substr( \$string, 1 ));
};

populate( \$searchString );

# We are only interested in the keys of our hash,
# longest matches first :
@subStrings = reverse
sort {
length(\$a) <=> length(\$b) # Sort by string length
|| \$a cmp \$b                 # and then by string content
} keys %subStrings;

# We read the file line by line :
my ( \$line, \$substring );
while (\$line = <DATA>) {
my @MatchedSubstrings = ();
foreach \$substring (@subStrings) {
if (\$line =~ /\$substring/) {
push @MatchedSubstrings, \$substring;
};
};
if (\$#MatchedSubstrings != -1) {
print "On line \$. : ", join(",", @MatchedSubstrings ),"\n";
};
};

__DATA__
AGATTACAAA
ZZGATTZZ
GATTAZZ
GATGATTACAZZ
asdfgh
gattaca

Note that there already are many Perl modules for Bioinformatics, a search of the CPAN (http://www.cpan.org) should give you interesting results, as should a Google search for Perl and DNA I guess.

```perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ;    # The
\$d = new HTTP::Daemon and fork and getprint \$d->url and exit;#spider
(\$c = \$d->accept())->get_request(); \$c->send_response( new   #in the
HTTP::Response(200,\$_,\$_,qq(Just another Perl hacker\n))); ' #  web
I don't know whether it has the precise methods required, but see bioperl.org for the Bio::Perl homepage. I would have checked myself, but I was too busy reinventing the wheel (maybe), below :-)

Tim

Here's my solution to what I understood you wanted (using your definitions of \$blocks and @motifs:

```my @results;
while (\$blocks) {
for (my \$len = 1;\$len <= length \$blocks;\$len++) {
my \$search = substr (\$blocks, 0, \$len);
push @results, grep (/\$search/, @motif);
}
#if there is a match, we're done
last if @results;
#mo match at starting position - try from next pos
\$blocks = substr (\$blocks, 1);
}
Does that do what you wanted, or did I misunderstand you?

pike

Create A New User
Node Status?
node history
Node Type: note [id://118726]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2020-06-05 07:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Do you really want to know if there is extraterrestrial life?

Results (35 votes). Check out past polls.

Notices?