index has already been mentioned, but I wanted to comment on this:

Only one hit is possible for each 'probe' string

It is possible that the probes have already been screened for uniqueness, but based on a frequency of 0.25 for each base (on average and subject to inter-species variation, of course) a 10-mer would occur by chance once every 1,048,576 bases (4**10). Your chromosomes are 30 million bases long, so in the absence of other information I'd expect each probe to match quite a few times on each chromosome.

That said, note that the third parameter of index sets the start position for searching the string. To find all matches you'll have to use index iteratively, m//g, or (the approach I prefer, BrowserUk++) create an index of the chromosomes before searching for your 1 million probes.


In reply to Re: pattern match, speed problem by bobf
in thread pattern match, speed problem by spring

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":