on a qrsh ~18G it takes about 20 minutes. When I can't get a node with that kind of memory its a few hours. I've been dividing up the iterations as separate qsubs which has been helping. Simplifying it to a right fischer model also greatly helps with speed..( constant population size) but this assumption has drawbacks in modeling true variation seen in pandemic outbreaks where you obviously don't have a constant population size.
I'm going to assume that qrsh and qsub are GRID apis?
Whilst there is much that can be done to improve the performance of your posted code, given these statistics, it seems likely that the main constraint for your program is memory usage. When your program moves into swapping, any titivations done to save a few microseconds here and there will just get drowned in the noise of disk(memory) thrashing.
My suggestion would be to modify your script to monitor the size of the %allgen hash and when it reaches a size that is likely to push the minimum size node on your GRID into swapping, split the generations of that hash into (say) four files and qsub four nodes to read those files and pick up the algorithm from that point.
So, (say) you run 20 iterations and generate 1 million mutations. You split those 1 million into 4 files and start four nodes to pick up from that point with 1/4 million candidates. When each of those nodes approaches 1 million mutations, you repeat the split. And so on.
You'll need to judge the split points in the light of your knowledge of the systems available to you. On my (currently only 2GB) system, I've never managed to run your code past 10 iterations before the process moved into swapping.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.