Re^2: Capture Lookahead

Yes and thanks! I made a slight adjustment. The true reason for generating these substrings is to see if there are any palindromes and I need to check all possible substrings of the fasta string (here 800 chars). This code is not checking for palindromes (but is trivial to add that). I just needed to get the munging part correct. (Just to run the program as is created an 86 MB file, which I won't be doing - just printing out the palindromes instead). But, it does take some time and with a larger fasta string, may take a while to just test every substring.

Thanks everyone. I'll be working at this for a while now. The reason I am doing this is because I saw that Mathmatica does a palindrome check in pretty terse terms (someone had a link to Mathmatica here in the Monks a few days ago).

Chris

#!/usr/bin/perl
use strict;
use warnings;

my $str = do {local $/; <DATA>};
$str =~ s/\s+//g;
my $len = length $str;

do {
    printf "%3d : %s\n", $_, substr( $str, $_, $len )
        for 0 .. ( length( $str ) - $len );
} while (--$len > 3);
[download]

Comment on Re^2: Capture Lookahead Download Code

Replies are listed 'Best First'.

Re^3: Capture Lookahead
by fishbot_v2 (Chaplain) on Jul 26, 2005 at 11:46 UTC

If you want to find palindromes, why not do it with a regex directly?

#!/usr/bin/perl
use strict;
use warnings;

my $str = do {local $/; <DATA>};
$str =~ s/\s+//g;

while ( $str =~ m/( (..+) .? (??{ reverse $2 }) )/xgc )
{
   print pos( $str ) . ": $1\n";
   pos( $str ) = $-[0] + 1;      # slide pos back to the left
}
[download]

prints:

14: AGGGA
21: TACAT
25: GTTG
55: GAAAAAAAG
...etc...
[download]

I'm not sure how it would compare with the two-stage approach for speed, though. It is much faster if you minimize the qualifier ..+?, but then you end up with the shortest palindrome at each position, rather than the longest.

I make the assumption that you don't care about palindromes shorter than 4 characters. If you bump that upwards, things get faster.

Update: I tested, and it looks like the substr approach is considerably faster, particularly if you do it in a single pass.

[reply]
[d/l]
[select]


There's more than one way to do things
	PerlMonks