http://www.perlmonks.org?node_id=976378


in reply to Re^5: counting the number of 16384 pattern matches in a large DNA sequence
in thread counting the number of 16384 pattern matches in a large DNA sequence

an untested variation:
while(/([ACGT]{7,})/g) { for my $ix (0..lenght($1) - 7) { ++$index{substr($1, $ix, 7)} } }
This regular expression should process every character on the string just once and so be an order of magnitude faster than yours which tries to match the look-ahead pattern at every char.

But that is just guessing... could you benchmark it?

  • Comment on Re^6: counting the number of 16384 pattern matches in a large DNA sequence
  • Download Code