Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Hypergeometric Probability Calculation

by Abigail-II (Bishop)
on Dec 04, 2003 at 23:22 UTC ( #312370=note: print w/replies, xml ) Need Help??

in reply to Hypergeometric Probability Calculation

Note that Math::Big::factorial returns a Math::BigInt object, who have overloaded math operators doing integer arithmetic. Which means that $delta could be 0 or 1 (or some other integer), but not a value between 0 and 1.


  • Comment on Re: Hypergeometric Probability Calculation

Replies are listed 'Best First'.
Re: Re: Hypergeometric Probability Calculation
by Itatsumaki (Friar) on Dec 04, 2003 at 23:48 UTC

    Thank-you: that was exactly the problem. Merely replacing Math::Big::factorial with Math::BigFloat::bfac fixed it. The code's below for posterity's sake:

    use strict; use Math::BigFloat; my $G = 11057; #$ARGV[0]; my $C = $ARGV[0]; my $n = $ARGV[1]; my $k = $ARGV[2]; sub choose { my $ret_val = Math::BigFloat->new(); my $n = Math::BigFloat->new($_[0]); my $r = Math::BigFloat->new($_[1]); my $n_r = Math::BigFloat->new($_[0] - $_[1]); $n->bfac(); $r->bfac(); $n_r->bfac(); $ret_val = $n / ($n_r * $r); return $ret_val; } my $p = Math::BigFloat->new('1'); my $denom = Math::BigFloat->new(choose($G, $n)); for (my $i = 0; $i < $k; $i++) { my $val1 = $G - $C; my $val2 = $n - $i; my $delta = Math::BigFloat->new(); $delta = choose($C, $i) * choose($val1, $val2) / $denom; $p -= $delta; } print "Probability estimate: $p\n";
      Two points. I found that if I used Math::BigFloat in the choose function, I got round off errors; errors I didn't get when using Math::BigInt.

      Second, I got a much better performance for the choose function if I didn't calculate 3 factorials, but just one, and did some multiplication myself. The further the second argument is away from half of the first argument, the bigger the advantage. Here's a benchmark:

      #!/usr/bin/perl use strict; use warnings; use Math::BigFloat; use Benchmark qw /timethese cmpthese/; # # choose (n, k) == n! / ((n - k)! * k!) == n! / (n - k)! / k! # == n * (n - 1) * ... * (n - k + 1) / k! sub choose_fac { my ($n, $k) = @_; Math::BigInt -> new ($n) -> bfac / Math::BigInt -> new ($n - $k) -> bfac / Math::BigInt -> new ($k) -> bfac } sub choose_mul { my ($n, $k) = @_; $k = $n - $k if $k > $n - $k; # Make the loop smaller; # we can do this because # choose (n, k) == choose (n, n - k +). my $p = Math::BigInt -> new (1); my $start = $n - $k + 1; for (my $i = $start; $i <= $n; $i ++) { $p *= $i; } $p / Math::BigInt -> new ($k) -> bfac; } our (@r1, @r2); our @pairs = map {[/(\d+)\s+(\d+)/]} <DATA>; cmpthese -10 => { fac => '@r1 = map {choose_fac @$_} @pairs', mul => '@r2 = map {choose_mul @$_} @pairs', }; die "Unequal" unless "@r1" eq "@r2"; __DATA__ 1000 0 1000 100 1000 500 1000 1000 s/iter fac mul fac 2.08 -- -89% mul 0.226 824% --


        I just peeked in Math::Big. The factorial implementation is very slow.

        You should find a significant performance improvement with:

        # # Usage: # factorial($n) = 1*2*...*$n # factorial($m, $n) = $m*($m+1)*...*$n # sub factorial { #divide and conquer unshift @_, 1 if 2 != @_; my ($m, $n) = @_; if ($m < $n) { my $k = int($m/2 + $n/2); return factorial($m, $k) * factorial($k+1, $n); } else { return Math::BigInt->new($m); } }
        (Incremental improvements over that are easily achieved as well.)

        I'll submit the suggestion to the maintainer.

        UPDATE: I remember the overload interface being slower on old Perl's, but on my machine (5.8.0) it seems marginally faster. So I replaced:

        return Math::BigInt->new(factorial($m, $k))->bmul(factorial($k+1,$ +n));
        return factorial($m, $k) * factorial($k+1, $n);

        UPDATE 2: Out of curiousity I wondered how the above Perl would compare with Ruby:

        # # Usage: # factorial(n) = n*(n-1)*...*1 # factorial(n, m) = n*(n-1)*...*m def factorial (n, m=1) if m < n then k=(m+n)/2; factorial(k, m) * factorial(n, k+1); else m end end
        This ran about 10x faster than Perl. Of course a naive factorial implementation in Ruby runs several times as fast as the smart one does in Perl.

        The difference is mainly what we get for all of the layers of getting around variables autoconverting themselves inappropriately for large integers. If you want to work with large integers, Perl is not the language to do it with.

        Oh wow... that benchmark isn't lying: doing it this way is radically faster. After some thought this morning, I had figured I would need to pre-calculate and store the first 15k factorials rather than do the calculation each time, but these changes help avoid that. Thank you.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://312370]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2020-12-04 03:02 GMT
Find Nodes?
    Voting Booth?
    How often do you use taint mode?

    Results (58 votes). Check out past polls.