Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Bioinformatics: Slow Parsing of a Fasta File

by erix (Vicar)
on Jul 27, 2010 at 21:43 UTC ( #851616=note: print w/ replies, xml ) Need Help??


in reply to Bioinformatics: Slow Parsing of a Fasta File

Bioperl has considerable overhead, but your numbers are extreme. Maybe you run an old bioperl? ( perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION, "\n";' )

I copied your sequences to a 200+ MB file, and tested with and without bioperl, and with a desktop vs server. The slowest I got was still within 5 minutes. The faster machine ran your program in just over 1 minute, the straight perl version in 2s.

(reader.pl is basically your code above).

############## # # 5-year old desktop machine # $ ls -lh Test.fasta -rw-r--r-- 1 aardvark aardvark 234M Jul 27 23:02 Test.fasta $ time perl -ne "print if /^>/" Test.fasta | wc -l 1572864 real 0m5.709s user 0m4.279s sys 0m1.233s $ time perl ./reader.pl | wc -l 1572864 real 4m26.494s user 4m8.936s sys 0m1.607s ############## # # server # $ ls -lh Test.fasta -rw-rw-r-- 1 aardvark aardvark 234M Jul 27 22:51 Test.fasta $ time perl -ne "print if /^>/" Test.fasta | wc -l 1572864 real 0m1.774s user 0m1.607s sys 0m0.214s $ time perl ./reader.pl | wc -l 1572864 real 1m9.258s user 1m9.012s sys 0m0.229s


Comment on Re: Bioinformatics: Slow Parsing of a Fasta File
Download Code
Replies are listed 'Best First'.
Re^2: Bioinformatics: Slow Parsing of a Fasta File
by Anonymous Monk on Jul 28, 2010 at 06:34 UTC
    The version that I have is 1.006001, how old is that ? and if its rusting old what is the best way to make an update?
    I am testing that right on the laptop itself so there's no issue of a +network hindrance of any sort.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://851616]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (17)
As of 2015-07-07 17:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls