|Perl: the Markov chain saw|
Up for Critiqueby biograd (Friar)
|on Mar 22, 2002 at 20:31 UTC||Need Help??|
biograd has asked for the
wisdom of the Perl Monks concerning the following question:
I've been working on a project for my degree, and it's about time to be done with the thing. This script is just one part it. (See my bio for more info.) Anyway, I'm feel like I've never had my code looked over by someone who knows what they're doing, so I posted the db-loading program on my scratchpad, along with a sample of the gene data input. If you feel like you have some spare time, I'd appreciate some constructive criticism.
(I didn't use a generic module because I knew that wouldn't be in the spirit of the project as my advisor saw it. If you'd like to advocate your favorite module, I'll try to study it before the next time I have to do something like this. :) )
My main concern is that the program takes too long to run. (14- 17 hours for 26,000+ records.) There is indexing on several of the fields. I've heard that if I didn't index while inserting data, everything might go faster. I've also heard that the indexing makes the multiple db searches needed go faster during the program, so any gain from indexing at the end is lost from the extra search time within the script. erg. As for the script, I did pass array references to the subroutines instead of copying arrays, but what are some other practical ways that you would optimize it for speed? It blazes on the first thousand records or so, then gets slower as it goes on. I expect that to some degree, since it has to search ever-increasing tables as it progresses, but is 14-17 hours realistic? I am seeing some stuff on optimizing the many regex in Programming Perl (pp 594-9), but I'm not sure the best way to apply it. I have read that tr// is faster than s//, which I could use in a couple places.
Thanks for any comments you can make. If you have questions or need more information, I'll oblige.