Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Re: Progressive pattern matching

by Anonymous Monk
on Oct 14, 2001 at 08:26 UTC ( #118736=note: print w/replies, xml ) Need Help??


in reply to Re: Progressive pattern matching
in thread Progressive pattern matching

Sorry guys...thanks for your code...I am still trying to see if it is applicable...done some initial testing with it and I think that I might not have explained myself well enough.
In one of my user prompts I ask for a string (it is actually a short protein sequence that they can put in)...lets say they input 20 letters (they represent the proteins sequence). I want to take these 20 letters and initiate a search into the file of their choice...
What I would like to happen is for the program to take the first letter (1/20) and search with it...if this letter is found in their file, the program will add onto the first letter, the second letter (2/20), and now search with both 1&2/20. If these two letters are found in the file in succession then the third letter will be added to the search string and so on. What may happen in reality is that it won't be until letter 8/20 before there is an initial match...letters 9, 10, 11, 12 when added one at a time will eventually make a search string of 8,9,10,11,12/20. Letter 13 will not match and then the program must end.
Here is my latest modification...
while (<MOTIF>) {@motif = <MOTIF>}; for ($i=0;$i<=length($blocks);$i++) { $pattern = substr($blocks, 0, $i); if (/$pattern/) {push(@array, $&);
not sure...
Again, thank-you for your help,
dr_jgbn

Replies are listed 'Best First'.
Re: Re: Re: Progressive pattern matching
by Corion (Pope) on Oct 14, 2001 at 12:12 UTC

    To clarify what I think that you want, let me construct some examples :

    Input string :
    GATTACA
    File :
    ATTACGATTACAAA
    GATT
    ZZGATTZZ
    asdghckasdlkj
    TTACA
    Output :

    On line 1 : GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA, GATT,ATTA,TTA,TAC,GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A On line 2 : GATT,GAT,ATT,TT,GA,AT,T,G,A On line 3 : GATTA,GATT,ATTA,TTA,GAT,ATT,TT,TA,GA,AT,T,G,A On line 4 : GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA, GATT,ATTA,TTA,TAC, GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A

    To achieve this, you want to find the longest substring of the input string that is found on a line of the file, for the various substrings that match until the end of the last character of the search string. To show you a first approach which is surely suboptimal, look at the following code which tries a brute force approach :

    use strict; my $searchString = "GATTACA"; my %subStrings = {}; my @subStrings = (); sub populate { # Fills the hash subStrings with all "allowed" substrings # of the argument. Duplicates are avoided by # filling a hash instead of an array. my ($string) = @_; return if $string eq ""; my $line = ""; foreach (split "", $string) { $line .= $_; #print "Added $line\n"; $subStrings{$line} = "1"; }; populate( substr( $string, 1 )); }; populate( $searchString ); # We are only interested in the keys of our hash, # longest matches first : @subStrings = reverse sort { length($a) <=> length($b) # Sort by string length || $a cmp $b # and then by string content } keys %subStrings; # We read the file line by line : my ( $line, $substring ); while ($line = <DATA>) { my @MatchedSubstrings = (); foreach $substring (@subStrings) { if ($line =~ /$substring/) { push @MatchedSubstrings, $substring; }; }; if ($#MatchedSubstrings != -1) { print "On line $. : ", join(",", @MatchedSubstrings ),"\n"; }; }; __DATA__ AGATTACAAA ZZGATTZZ GATTAZZ GATGATTACAZZ asdfgh gattaca

    Note that there already are many Perl modules for Bioinformatics, a search of the CPAN (http://www.cpan.org) should give you interesting results, as should a Google search for Perl and DNA I guess.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
      I don't know whether it has the precise methods required, but see bioperl.org for the Bio::Perl homepage. I would have checked myself, but I was too busy reinventing the wheel (maybe), below :-)

      Tim

Re: Re: Re: Progressive pattern matching
by pike (Monk) on Oct 15, 2001 at 13:42 UTC
    Here's my solution to what I understood you wanted (using your definitions of $blocks and @motifs:

    my @results; while ($blocks) { for (my $len = 1;$len <= length $blocks;$len++) { my $search = substr ($blocks, 0, $len); push @results, grep (/$search/, @motif); } #if there is a match, we're done last if @results; #mo match at starting position - try from next pos $blocks = substr ($blocks, 1); }
    Does that do what you wanted, or did I misunderstand you?

    pike

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://118736]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2020-05-31 11:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (173 votes). Check out past polls.

    Notices?