http://www.perlmonks.org?node_id=478052


in reply to Capture Lookahead

Like this?

#!/usr/bin/perl use strict; use warnings; my $str = do {local $/; <DATA>}; $str =~ s/\s+//g; my $len = length $str; while (--$len > 780) { printf "%3d : %s\n", $_, substr( $str, $_, $len ) for 1 .. ( length( $str ) - $len ); }

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Replies are listed 'Best First'.
Re^2: Capture Lookahead
by Cristoforo (Curate) on Jul 26, 2005 at 02:57 UTC
    Yes and thanks! I made a slight adjustment. The true reason for generating these substrings is to see if there are any palindromes and I need to check all possible substrings of the fasta string (here 800 chars). This code is not checking for palindromes (but is trivial to add that). I just needed to get the munging part correct. (Just to run the program as is created an 86 MB file, which I won't be doing - just printing out the palindromes instead). But, it does take some time and with a larger fasta string, may take a while to just test every substring.

    Thanks everyone. I'll be working at this for a while now. The reason I am doing this is because I saw that Mathmatica does a palindrome check in pretty terse terms (someone had a link to Mathmatica here in the Monks a few days ago).

    Chris

    #!/usr/bin/perl use strict; use warnings; my $str = do {local $/; <DATA>}; $str =~ s/\s+//g; my $len = length $str; do { printf "%3d : %s\n", $_, substr( $str, $_, $len ) for 0 .. ( length( $str ) - $len ); } while (--$len > 3);

      If you want to find palindromes, why not do it with a regex directly?

      #!/usr/bin/perl use strict; use warnings; my $str = do {local $/; <DATA>}; $str =~ s/\s+//g; while ( $str =~ m/( (..+) .? (??{ reverse $2 }) )/xgc ) { print pos( $str ) . ": $1\n"; pos( $str ) = $-[0] + 1; # slide pos back to the left }

      prints:

      14: AGGGA 21: TACAT 25: GTTG 55: GAAAAAAAG ...etc...

      I'm not sure how it would compare with the two-stage approach for speed, though. It is much faster if you minimize the qualifier ..+?, but then you end up with the shortest palindrome at each position, rather than the longest.

      I make the assumption that you don't care about palindromes shorter than 4 characters. If you bump that upwards, things get faster.

      Update: I tested, and it looks like the substr approach is considerably faster, particularly if you do it in a single pass.