|Do you know where your variables are?|
Iteration speedby seaver (Pilgrim)
|on Jun 15, 2004 at 17:06 UTC||Need Help??|
seaver has asked for the wisdom of the Perl Monks concerning the following question:
My problem is a bioinformatics problem, I'm currently running a script that is processing 13,000 files which contain co-ordinates for mulit-chain proteins
It calculates the interaction between each of the residues, between each chain (if there are any interactions).
Thus, it is iterating not only through each chain pair, but through every possible residue pair (though not residues in the same chain of course) and seeing if they are close enough, and determining the kind of interaction if so.
My problem, which I fear is unavoidable, as it is compounded by the fact that I cannot avoid to MISS any possible interactions, is that some of the larger files take hours, even more than a day, to process.
At this rate, it can take nearly a year, to go through all the files, which is unfortunate.
So take for example, a 6 chain protein, with approx 3000 residues, that's approx 500 residues per chain. So in one chain pair, there's 500x500 iterations, which is 250,000 iterations, and because there's 6 chains, that's 15 possible chain pairs (avoiding repeats eg: AB == BA) so thats .25 shy of 4 million iterations.
I just wanted to know, what are the potential bottlenecks? One such file (larger than the above example) is still being processed after 1.5 days!
The way my program runs, is that, while it reads in the file, for every new residue it reads in, it iterates through the list currently in memory (avoiding residues in the same chain of course) to look for new interactions, and at the same time, is populating a database with the atomic and residual details, and the interactions if any. Is this a stupid way of doing it?