in reply to Faulty Control Structures?

My first thing to check would be the fact that you have nested for-loops that may just be taking a very very very long time to run. If you have small files providing the data for @main and @annot, the for-loops will run quickly. If you have large files for both, then the runtime of the innermost loop will be the main determiner of the runtime of the script. Given that you then have 2 more nested for-loops in range_find(), your runtime is going to be on the order of O(n^4). This is usually considered to be poor.

The biggest suggestion I would have (after the obvious algorithmic improvement) is to pre-digest your data so that repetitive checks can be sped up or eliminated. The next thing would be to look at using a relational database (like MySQL or Oracle) and letting the power of relational calculus solve your problems.

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Replies are listed 'Best First'.
Re^2: Faulty Control Structures?
by bioinformatics (Friar) on Jan 28, 2008 at 23:31 UTC
    Your right. I tried to provide checkpoints so that I wouldn't have to crawl through all the data (aka, making sure the chromosomes considered match, etc.), but even then there is a lot to go through. Heck, I even tried to write a multi threaded version of this program, but it has a few bugs still. Should it really take this long though? I don't have any benchmarks written in, so that will probably be my next step. I'll look into putting this into a database to do some of this for me, but sql is still a bit of a learning curve for me.