Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Re: Re: Progressive pattern matching

by Corion (Pope)
on Oct 14, 2001 at 12:12 UTC ( #118742=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Progressive pattern matching
in thread Progressive pattern matching

To clarify what I think that you want, let me construct some examples :

Input string :
GATTACA
File :
ATTACGATTACAAA
GATT
ZZGATTZZ
asdghckasdlkj
TTACA
Output :

On line 1 : GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA, GATT,ATTA,TTA,TAC,GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A On line 2 : GATT,GAT,ATT,TT,GA,AT,T,G,A On line 3 : GATTA,GATT,ATTA,TTA,GAT,ATT,TT,TA,GA,AT,T,G,A On line 4 : GATTACA,GATTAC,ATTACA,TTACA,GATTA,ATTAC,TTAC,TACA, GATT,ATTA,TTA,TAC, GAT,ATT,ACA,TT,TA,GA,CA,AT,AC,T,G,C,A

To achieve this, you want to find the longest substring of the input string that is found on a line of the file, for the various substrings that match until the end of the last character of the search string. To show you a first approach which is surely suboptimal, look at the following code which tries a brute force approach :

use strict; my $searchString = "GATTACA"; my %subStrings = {}; my @subStrings = (); sub populate { # Fills the hash subStrings with all "allowed" substrings # of the argument. Duplicates are avoided by # filling a hash instead of an array. my ($string) = @_; return if $string eq ""; my $line = ""; foreach (split "", $string) { $line .= $_; #print "Added $line\n"; $subStrings{$line} = "1"; }; populate( substr( $string, 1 )); }; populate( $searchString ); # We are only interested in the keys of our hash, # longest matches first : @subStrings = reverse sort { length($a) <=> length($b) # Sort by string length || $a cmp $b # and then by string content } keys %subStrings; # We read the file line by line : my ( $line, $substring ); while ($line = <DATA>) { my @MatchedSubstrings = (); foreach $substring (@subStrings) { if ($line =~ /$substring/) { push @MatchedSubstrings, $substring; }; }; if ($#MatchedSubstrings != -1) { print "On line $. : ", join(",", @MatchedSubstrings ),"\n"; }; }; __DATA__ AGATTACAAA ZZGATTZZ GATTAZZ GATGATTACAZZ asdfgh gattaca

Note that there already are many Perl modules for Bioinformatics, a search of the CPAN (http://www.cpan.org) should give you interesting results, as should a Google search for Perl and DNA I guess.

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Replies are listed 'Best First'.
Re: Re: Re: Re: Progressive pattern matching
by tfrayner (Curate) on Oct 15, 2001 at 19:35 UTC
    I don't know whether it has the precise methods required, but see bioperl.org for the Bio::Perl homepage. I would have checked myself, but I was too busy reinventing the wheel (maybe), below :-)

    Tim

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://118742]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2020-06-06 08:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (41 votes). Check out past polls.

    Notices?