|Welcome to the Monastery|
Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?by Commander Salamander (Acolyte)
|on Jun 14, 2005 at 00:00 UTC||Need Help??|
Commander Salamander has asked for the wisdom of the Perl Monks concerning the following question:
Hi everyone, I'm rather new to perl, and have been trying to impelement a calculation of hypergeometric distribution probabilities into some code. For those of you that aren't familiar, these statistics address the following type of question:
"If I have a box with 300 white balls and 700 black balls and I draw 100 balls from the box, what are the odds that I draw 40 or more white balls".
I've encountered two (related) problems:
1 These calculations require very large numbers, apparently necessitating the use of a module such as BigInt or BigFloat (though I don't fully understand at what size number these modules become necessary). By playing with someone else's publicly available code and clumsily implementing this functionality, I was able to get things to work... however:
2 Especially when I'm using very large samples, the process of calculating the probability of is excruciatingly slow due to the large number of factorial calculations necessary.
I would greatly appreciate advice on which "big number" module is the most efficient for factorials, and whether there is any conventional wisdom as to how I could best tackle this problem.