|Pathologically Eclectic Rubbish Lister|
Re^4: Searching array against hashby BrowserUk (Pope)
|on Aug 22, 2013 at 03:18 UTC||Need Help??|
It will help if you are looking to retrieve a subsequence from the human genome, the FASTA file of which is about 5 Gb;
I guess things have moved on. The version I have is just under 3GB and came in 25 files chr(1-22, M, X, Y).
That said, if his 3 posted sequences are representative of his 900,000; that means his file is a tad under 900MB.
Which if he can process that in "a few seconds"; means he could process your 5GB file in 5+bit * "a few seconds".
But, and here is the point. It will take Bio::DB::Fasta at least that same 5+bit*"a few seconds" to construct an index; before he can start processing anything. So for a one-off process, there is a net loss.
Now the real crux. Given all the additional layers and overheads; how many times does he have to redo the process in order to obtain a net gain? (If ever.)
Then add to that the (potential) problems with installation; and the learning curve of finding your way around the documentation for 897 modules to find the one that you want; and then learning how to use it to do what you want; and suddenly the reason why so many bioinformaticians are looking for Lite alternatives to the Bio::Behemoth and simple procedures in order to get their work done; rather than becoming technical debt slaves to the byzantine Bio::Empire.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.