Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: pattern match, speed problem

by bobf (Monsignor)
on Feb 21, 2008 at 05:09 UTC ( [id://669186]=note: print w/replies, xml ) Need Help??


in reply to pattern match, speed problem

index has already been mentioned, but I wanted to comment on this:

Only one hit is possible for each 'probe' string

It is possible that the probes have already been screened for uniqueness, but based on a frequency of 0.25 for each base (on average and subject to inter-species variation, of course) a 10-mer would occur by chance once every 1,048,576 bases (4**10). Your chromosomes are 30 million bases long, so in the absence of other information I'd expect each probe to match quite a few times on each chromosome.

That said, note that the third parameter of index sets the start position for searching the string. To find all matches you'll have to use index iteratively, m//g, or (the approach I prefer, BrowserUk++) create an index of the chromosomes before searching for your 1 million probes.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://669186]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-28 18:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found