in reply to pattern match, speed problem

index has already been mentioned, but I wanted to comment on this:

Only one hit is possible for each 'probe' string

It is possible that the probes have already been screened for uniqueness, but based on a frequency of 0.25 for each base (on average and subject to inter-species variation, of course) a 10-mer would occur by chance once every 1,048,576 bases (4**10). Your chromosomes are 30 million bases long, so in the absence of other information I'd expect each probe to match quite a few times on each chromosome.

That said, note that the third parameter of index sets the start position for searching the string. To find all matches you'll have to use index iteratively, m//g, or (the approach I prefer, BrowserUk++) create an index of the chromosomes before searching for your 1 million probes.