Re^6: counting the number of 16384 pattern matches in a large DNA sequence

in reply to Re^5: counting the number of 16384 pattern matches in a large DNA sequence
in thread counting the number of 16384 pattern matches in a large DNA sequence

an untested variation:

while(/([ACGT]{7,})/g) {
  for my $ix (0..lenght($1) - 7) {
    ++$index{substr($1, $ix, 7)}
  }
}
[download]

This regular expression should process every character on the string just once and so be an order of magnitude faster than yours which tries to match the look-ahead pattern at every char.

But that is just guessing... could you benchmark it?

Comment on Re^6: counting the number of 16384 pattern matches in a large DNA sequence Download Code

In Section Seekers of Perl Wisdom