|Problems? Is your data what you think it is?|
Range overlap (with a lot of ranges)by BioLion (Curate)
|on May 24, 2010 at 11:22 UTC||Need Help??|
BioLion has asked for the
wisdom of the Perl Monks concerning the following question:
This is a bioinformatics problem, but I think it is generally applicable, so I hope someone has some input:
The problem: Given a test range X, and N previously defined ranges, define which range (if any) X overlaps with.
Then simply lookup which Range fits my test. Alternately a lookup table / DB approach could be used to the same ends.
So my question is whether anyone has come up against a similar problem, or tested any of these solutions? I think I will go with the 'lookup' method, but I wanted to check if anyone had any input. Obviously I would like to do things properly and benchmark proper comparisons, but i am a bit pushed for time.
Thanks in advance,
Update for those interested:
Turns out the 'giant array' idea is *very* memory hungry (please don't point out that this was obvious, we all live in hope...). So I am left with storing ordered ranges and doing a binary search (Thanks Marshall). I wanted to check on the various claims for memory usage made below, so here is what I came up with:
Clearly the string method is a lot faster to pre-compute and more memory efficient, and will hopefully make up for the taime taken to subsequently parse the string. It also provides a simple 'jumping' mechanism for the binary search. SO big thanks to BrowserUk. THe method is also nicely 'tunable' in that you can make the hash bigger (use more memory, but reduce the lookup window) or vice versa depending on your needs.
I will post more once i also test the lookup method, and also salva's suggestions. Thanks all round - I may actually get this done today.
Just a something something...