Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Capture Lookahead

by Cristoforo (Deacon)
on Jul 26, 2005 at 02:57 UTC ( #478060=note: print w/ replies, xml ) Need Help??


in reply to Re: Capture Lookahead
in thread Capture Lookahead

Yes and thanks! I made a slight adjustment. The true reason for generating these substrings is to see if there are any palindromes and I need to check all possible substrings of the fasta string (here 800 chars). This code is not checking for palindromes (but is trivial to add that). I just needed to get the munging part correct. (Just to run the program as is created an 86 MB file, which I won't be doing - just printing out the palindromes instead). But, it does take some time and with a larger fasta string, may take a while to just test every substring.

Thanks everyone. I'll be working at this for a while now. The reason I am doing this is because I saw that Mathmatica does a palindrome check in pretty terse terms (someone had a link to Mathmatica here in the Monks a few days ago).

Chris

#!/usr/bin/perl use strict; use warnings; my $str = do {local $/; <DATA>}; $str =~ s/\s+//g; my $len = length $str; do { printf "%3d : %s\n", $_, substr( $str, $_, $len ) for 0 .. ( length( $str ) - $len ); } while (--$len > 3);


Comment on Re^2: Capture Lookahead
Download Code
Re^3: Capture Lookahead
by fishbot_v2 (Chaplain) on Jul 26, 2005 at 11:46 UTC

    If you want to find palindromes, why not do it with a regex directly?

    #!/usr/bin/perl use strict; use warnings; my $str = do {local $/; <DATA>}; $str =~ s/\s+//g; while ( $str =~ m/( (..+) .? (??{ reverse $2 }) )/xgc ) { print pos( $str ) . ": $1\n"; pos( $str ) = $-[0] + 1; # slide pos back to the left }

    prints:

    14: AGGGA 21: TACAT 25: GTTG 55: GAAAAAAAG ...etc...

    I'm not sure how it would compare with the two-stage approach for speed, though. It is much faster if you minimize the qualifier ..+?, but then you end up with the shortest palindrome at each position, rather than the longest.

    I make the assumption that you don't care about palindromes shorter than 4 characters. If you bump that upwards, things get faster.

    Update: I tested, and it looks like the substr approach is considerably faster, particularly if you do it in a single pass.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://478060]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2015-07-03 06:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (48 votes), past polls