Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?

by tall_man (Parson)
on Jun 14, 2005 at 15:19 UTC ( #466599=note: print w/ replies, xml ) Need Help??


in reply to Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?

You should compute log gamma. There are formulas that compute it very quickly and accurately, for example: Malloc with Inline::C. By the way gamma(n+1) = n! for positive integer n. (Some responders said gamma(n) = n!, which is wrong).

Update: Here is some code (perl version of gammln is from Re^4: Challenge: Chasing Knuth's Conjecture):

#!/usr/bin/perl -w use strict; sub logfact { return gammln(shift(@_) + 1.0); } sub hypergeom { # There are m "bad" and n "good" balls in an urn. # Pick N of them. The probability of i or more successful selection +s: # (m!n!N!(m+n-N)!)/(i!(n-i)!(m+i-N)!(N-i)!(m+n)!) my ($n, $m, $N, $i) = @_; my $loghyp1 = logfact($m)+logfact($n)+logfact($N)+logfact($m+$n-$N) +; my $loghyp2 = logfact($i)+logfact($n-$i)+logfact($m+$i-$N)+logfact( +$N-$i)+logfact($m+$n); return exp($loghyp1 - $loghyp2); } sub gammln { my $xx = shift; my @cof = (76.18009172947146, -86.50532032941677, 24.01409824083091, -1.231739572450155, 0.12086509738661e-2, -0.5395239384953e-5); my $y = my $x = $xx; my $tmp = $x + 5.5; $tmp -= ($x + .5) * log($tmp); my $ser = 1.000000000190015; for my $j (0..5) { $ser += $cof[$j]/++$y; } -$tmp + log(2.5066282746310005*$ser/$x); } print hypergeom(300,700,100,40),"\n";


Comment on Re: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
Download Code
Re: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
by Commander Salamander (Acolyte) on Jun 14, 2005 at 15:59 UTC
    Wow, thanks to all of you for the incredible amount of advice. I'll work my way through your suggestions today.

    Thanks again!
Re^2: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
by jmuhlich (Acolyte) on Jan 26, 2006 at 01:04 UTC
    I only just ran across this node, thanks everyone!

    For pure speed then you can inline the gammln code, unroll the loop into one big equation, and move the constants into the equation instead of referencing them indirectly in the array. I also got rid of $y for a very small speedup. This code runs about 135% faster than logfact above (i.e. over twice as fast). I renamed the function factln to be consistent with gammaln.

    BTW this code appears to originate in Numerical Recipes but no credit was given in Re^4: Challenge: Chasing Knuth's Conjecture, referenced in the parent. All of the function names in that book are 6 characters long, because there's a Fortran (F77) version of the book too. Thus "gammln" instead of "gammaln".

    sub factln { my $x = (shift) + 1; my $tmp = $x + 5.5; $tmp -= ($x + .5) * log($tmp); my $ser = 1.000000000190015 + 76.18009172947146 / ++$x - 86.50532032941677 / ++$x + 24.01409824083091 / ++$x - 1.231739572450155 / ++$x + 0.12086509738661e-2 / ++$x - 0.5395239384953e-5 / ++$x; return log(2.5066282746310005*$ser/($x-6)) - $tmp; }
Re^2: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
by Anonymous Monk on Apr 05, 2010 at 21:15 UTC
    I could be mistaken but I think this calculates the probability of i successful selections, not i or more successful selections as claimed above. For the cdf, prob of i or more successes, you need to do the following:
    my $hypercdf = 0; for (my $iref=$i; $iref < min($N,$n); $iref++) { $hypercdf += hypergeom($n,$m,$N,$iref); } print $hypercdf;
      You may need a less than or equal to in the condition of the for loop. This probably won't make a difference in most cases, as the final probability is usually very small.
      my $hypercdf = 0; for (my $iref=$i; $iref <= min($N,$n); $iref++) { $hypercdf += hypergeom($n,$m,$N,$iref); } print $hypercdf;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://466599]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2014-09-18 10:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (111 votes), past polls