Re^2: Capture Lookahead

by Cristoforo (Curate)
on Jul 26, 2005 at 02:57 UTC

in reply to Re: Capture Lookahead
in thread Capture Lookahead

Yes and thanks! I made a slight adjustment. The true reason for generating these substrings is to see if there are any palindromes and I need to check all possible substrings of the fasta string (here 800 chars). This code is not checking for palindromes (but is trivial to add that). I just needed to get the munging part correct. (Just to run the program as is created an 86 MB file, which I won't be doing - just printing out the palindromes instead). But, it does take some time and with a larger fasta string, may take a while to just test every substring.

Thanks everyone. I'll be working at this for a while now. The reason I am doing this is because I saw that Mathmatica does a palindrome check in pretty terse terms (someone had a link to Mathmatica here in the Monks a few days ago).


#!/usr/bin/perl use strict; use warnings; my $str = do {local $/; <DATA>}; $str =~ s/\s+//g; my $len = length $str; do { printf "%3d : %s\n", $_, substr( $str, $_, $len ) for 0 .. ( length( $str ) - $len ); } while (--$len > 3);

Replies are listed 'Best First'.
Re^3: Capture Lookahead
by fishbot_v2 (Chaplain) on Jul 26, 2005 at 11:46 UTC

    If you want to find palindromes, why not do it with a regex directly?

    #!/usr/bin/perl use strict; use warnings; my $str = do {local $/; <DATA>}; $str =~ s/\s+//g; while ( $str =~ m/( (..+) .? (??{ reverse $2 }) )/xgc ) { print pos( $str ) . ": $1\n"; pos( $str ) = $-[0] + 1; # slide pos back to the left }


    14: AGGGA 21: TACAT 25: GTTG 55: GAAAAAAAG ...etc...

    I'm not sure how it would compare with the two-stage approach for speed, though. It is much faster if you minimize the qualifier ..+?, but then you end up with the shortest palindrome at each position, rather than the longest.

    I make the assumption that you don't care about palindromes shorter than 4 characters. If you bump that upwards, things get faster.

    Update: I tested, and it looks like the substr approach is considerably faster, particularly if you do it in a single pass.

Node Type: note [id://478060]
