Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Hypergeometric Probability Calculation

by Itatsumaki (Friar)
on Dec 04, 2003 at 23:10 UTC ( #312365=perlquestion: print w/replies, xml ) Need Help??

Itatsumaki has asked for the wisdom of the Perl Monks concerning the following question:

Howdy fellow monks,

I am trying to implement the hyper-geometric probability in Perl. The general form of the equation is given, for instance, in the second page of this article, under the heading "Interpretation of Clusters".

Since the values can get quite large, I thought I would implement using Math::Big and Math::BigFloat, but I seem to have a misunderstanding or a bug somewhere that I cannot track down. The code is below. The symptom is that the value of $delta through each loop iteration is 0, and the final probability value is always one. When I do some calculations manually I get distinctly non-zero $delta for the last few loop iterations. Can anyone see what's I've done wrong?

use strict; use Math::BigFloat; use Math::Big; my $G = 40; #$ARGV[0]; my $C = 25; #$ARGV[1]; my $n = 15; #$ARGV[2]; my $k = 14; #$ARGV[3]; sub choose { my $temp = Math::BigFloat->new('1'); $temp = Math::Big::factorial($_[0]) / Math::Big::factorial ($_[0] - $_[1]) / Math::Big::factorial($_[1]); return $temp; } my $p = Math::BigFloat->new('1'); my $denom = Math::BigFloat->new(choose($G, $n)); for (my $i = 0; $i < $k; $i++) { my $val1 = $G - $C; my $val2 = $n - $i; my $delta = Math::BigFloat->new(); $delta = choose($C, $i) * choose($val1, $val2) / $denom; print "$delta\n"; $p -= $delta; } print "Probability estimate: $p\n";
-Tats

Replies are listed 'Best First'.
Re: Hypergeometric Probability Calculation
by Abigail-II (Bishop) on Dec 04, 2003 at 23:22 UTC
    Note that Math::Big::factorial returns a Math::BigInt object, who have overloaded math operators doing integer arithmetic. Which means that $delta could be 0 or 1 (or some other integer), but not a value between 0 and 1.

    Abigail

      Thank-you: that was exactly the problem. Merely replacing Math::Big::factorial with Math::BigFloat::bfac fixed it. The code's below for posterity's sake:

      use strict; use Math::BigFloat; my $G = 11057; #$ARGV[0]; my $C = $ARGV[0]; my $n = $ARGV[1]; my $k = $ARGV[2]; sub choose { my $ret_val = Math::BigFloat->new(); my $n = Math::BigFloat->new($_[0]); my $r = Math::BigFloat->new($_[1]); my $n_r = Math::BigFloat->new($_[0] - $_[1]); $n->bfac(); $r->bfac(); $n_r->bfac(); $ret_val = $n / ($n_r * $r); return $ret_val; } my $p = Math::BigFloat->new('1'); my $denom = Math::BigFloat->new(choose($G, $n)); for (my $i = 0; $i < $k; $i++) { my $val1 = $G - $C; my $val2 = $n - $i; my $delta = Math::BigFloat->new(); $delta = choose($C, $i) * choose($val1, $val2) / $denom; $p -= $delta; } print "Probability estimate: $p\n";
        Two points. I found that if I used Math::BigFloat in the choose function, I got round off errors; errors I didn't get when using Math::BigInt.

        Second, I got a much better performance for the choose function if I didn't calculate 3 factorials, but just one, and did some multiplication myself. The further the second argument is away from half of the first argument, the bigger the advantage. Here's a benchmark:

        #!/usr/bin/perl use strict; use warnings; use Math::BigFloat; use Benchmark qw /timethese cmpthese/; # # choose (n, k) == n! / ((n - k)! * k!) == n! / (n - k)! / k! # == n * (n - 1) * ... * (n - k + 1) / k! sub choose_fac { my ($n, $k) = @_; Math::BigInt -> new ($n) -> bfac / Math::BigInt -> new ($n - $k) -> bfac / Math::BigInt -> new ($k) -> bfac } sub choose_mul { my ($n, $k) = @_; $k = $n - $k if $k > $n - $k; # Make the loop smaller; # we can do this because # choose (n, k) == choose (n, n - k +). my $p = Math::BigInt -> new (1); my $start = $n - $k + 1; for (my $i = $start; $i <= $n; $i ++) { $p *= $i; } $p / Math::BigInt -> new ($k) -> bfac; } our (@r1, @r2); our @pairs = map {[/(\d+)\s+(\d+)/]} <DATA>; cmpthese -10 => { fac => '@r1 = map {choose_fac @$_} @pairs', mul => '@r2 = map {choose_mul @$_} @pairs', }; die "Unequal" unless "@r1" eq "@r2"; __DATA__ 1000 0 1000 100 1000 500 1000 1000 s/iter fac mul fac 2.08 -- -89% mul 0.226 824% --

        Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://312365]
Approved by blokhead
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2020-05-29 08:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (167 votes). Check out past polls.

    Notices?