http://www.perlmonks.org?node_id=895892

Microcebus has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I wrote a script to calculate the propability to exceed a defined pips sum throwing a loaded die with x sides n times. The number of pips for each side can also be defined.
$throw_die_ntimes=5; $min_summed_pips=60; $sides_of_die=5; # propability to get a certain side @propabilities=(0.05,0.1,0.2,0.3,0.35); # number of pips for each side @values=(1,5,7,13,17); # generate possible compositions @start_array=("1","2","3","4","5"); foreach(1..$throw_die_ntimes-1) { %hash=(); foreach$composition(@start_array) { foreach(1..$sides_of_die) { $new_composition=$composition.$_; @sort_new_composition=split('',$new_composition); @sort_new_composition=sort@sort_new_composition; $new_composition=join('',@sort_new_composition); push(@new_array,$new_composition); $hash{$new_composition}++; } } @start_array=@new_array; @new_array=(); } # calculate propability of each composition $total_prop_to_exceed_minimum=0; $total_prop_to_deceed_minimum=0; foreach$composition(keys%hash) { $propability_of_composition=1; $value_of_composition=0; @composition=split('',$composition); foreach$step(@composition) { $propability_of_composition=$propability_of_composition*$propa +bilities[$step-1]; $value_of_composition+=$values[$step-1]; } $propability_of_composition=$propability_of_composition*$hash{$com +position}; # add propability to one of two total propabilities (exceed or dec +eed minimum) if($value_of_composition>=$min_summed_pips) {$total_prop_to_exceed_minimum+=$propability_of_composition;} else {$total_prop_to_deceed_minimum+=$propability_of_composition;} } print"p to exceed minimum: $total_prop_to_exceed_minimum\np to deceed +minimum: $total_prop_to_deceed_minimum\n\n"; system("pause"); exit;
This code works but gets very slow when I throw the die more then 10 times. The problem seems to be the exploding array when calculating all possible compositions. I pored over a smarter way to do this but still I do not have a clue.
Any ideas?

Replies are listed 'Best First'.
Re: Loaded die
by jethro (Monsignor) on Mar 28, 2011 at 13:00 UTC

    I can think of two faster ways:

    1) If you don't "trust" math, you just run a few thousand virtual test runs with your dice and note the average result. This will be accuracte enough for almost any use case, for example to calculate your average income with these dice ;-)

    2) Use math. A good book on stochastics should have the formulas that tell you the exact number if you give the average result of the die (at least if the values are not too different and smaller than the target sum). I think the relevant calculation you need has to do with a gaussian curve.

Re: Loaded die
by BrowserUk (Patriarch) on Mar 28, 2011 at 14:09 UTC

    Rather than brute force iteration, you could always simulate:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; use List::Util qw[ sum ]; use Math::Random::MT qw[ rand ]; my $throws = 5; my $sides = 5; my @probs = ( 0.05, 0.1, 0.2, 0.3, 0.35 ); my @values = ( 1, 5, 7, 13, 17 ); my @lookup = map{ ( $values[ $_ ] ) x ( $probs[ $_ ] * 100 ); } 0 .. $#values; our $MIN //= 60; our $RUNS //= 1e5; my $moreThanMin = 0; for ( 1 .. $RUNS ) { my $total = sum map{ $lookup[ rand( scalar @lookup ) ] } 1 .. $thr +ows; ++$moreThanMin if $total > $MIN; } printf "Probability based on $RUNS simulations: %.3f%%", $moreThanMin / $RUNS * 100; __END__ c:\test>895892 -RUNS=1e6 Probability based on 1e6 simulations: 48.752%

    It won't save time on small die, but on the larger ones you can decide how much time you want to spend to get closer and closer to the answer.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      You are right. I thought about this option but in my special case, I have to calculate possibilities like 1/10^100. Hence I would have to do more than 10^100 simulations...
        I have to calculate possibilities like 1/10^100.

        Hm. Odds that small are so close to zero as to be meaningless.

        There are estimated to be 10^80 atoms in the universe. If one of them had gone missing (or an extra one had materialised into existance) at some point after matter crystallised out of the big bang, the affect the absence or presence of that one atom would have had on the form of the universe would be undetectable at any scale.

        And the affect that a 1/10^100 change in probability will have (on anything) is 100 million trillion times less significant.

        To try and put that into perspective. If a casino dice had 1/10^100 bias, it would undetectable by any mechanism. I would guess that every single dice ever produced has been, is, and always will be biased billions of times more significantly. A single missing electron from the face of the dice will bias it far more significantly than that.

        Indeed, the bias of every dice ever made probably varies by bllions of times more than that, second by second, due to moisture absorption, evaporation or radioactive decay.

        If every human being that ever existed, threw every dice that ever existed, once every microsecond, for their entire lives and you managed to record every single throw, you would not be able to detect if such a rare event had occurred or not.

        So what is the point of knowing such probabilities? Even knowing that they are not zero is useless information.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Loaded die
by ELISHEVA (Prior) on Mar 28, 2011 at 13:09 UTC

    Well, for starters in might have something to do with the way you are creating your hash %hash. Each time you cycle through the outer loop @start_array grows in size (by a factor of 5) so each throw of the die results in every more iterations in the inner two loops. Here is what the count looks like:

    throw of die: 1: iterations in inner two loops: 25 total iterations: 25 throw of die: 2: iterations in inner two loops: 125 total iterations: 150 throw of die: 3: iterations in inner two loops: 625 total iterations: 775 throw of die: 4: iterations in inner two loops: 3125 total iterations: 3900 throw of die: 5: iterations in inner two loops: 15625 total iterations: 19525 throw of die: 6: iterations in inner two loops: 78125 total iterations: 97650 throw of die: 7: iterations in inner two loops: 390625 total iterations: 488275 throw of die: 8: iterations in inner two loops: 1953125 total iterations: 2441400 ...

    Did you mean for @start_array to be growing in size this way? By dice throw 10, @start_array will have 5 to the 10th power number of elements and the 10th throw will require 5*11 iterations to complete.

      Yes, the growing size of @start_array is the problem. The array contains the same compositions many times (one time for each possible chronology) which I count when I create the hash. I think there must be a smarter way...
Re: Loaded die
by JavaFan (Canon) on Mar 28, 2011 at 13:26 UTC
    Cache the partial results:
    use 5.010; use strict; use warnings; use List::Util 'sum'; my @props = (0.05, 0.1, 0.2, 0.3); push @props, 1 - sum @props; # Make sure we sum to 1. my @THROWS = (5, 10, 25); my @values = (1, 5, 7, 13, 17, 23, 31, 43, 59, 91, 119); my %cache; sub chance; sub chance { my $throws = shift; my $target = shift; return $cache{$throws, $target} if defined $cache{$throws, $target +}; return $cache{$throws, $target} = 0 if $throws < 1 || $target < $t +hrows; if ($throws == 1) { return $cache{$throws, $target} = 0 if $target > @props; return $cache{$throws, $target} = $props[$target-1]; } return $cache{$throws, $target} = sum map {$props[$_-1] * chance $throws-1, $target-$_} 1 .. + $#props; } my $start = sum times; foreach my $throws (@THROWS) { foreach my $value (@values) { say "Chance of hitting $value in $throws throws: ", chance $th +rows, $value; } } my $end = sum times; say "Running time: ", 1000 * ($end - $start), " milli-seconds"; __END__ Chance of rolling 1 in 5 throws: 0 Chance of rolling 5 in 5 throws: 3.125e-07 Chance of rolling 7 in 5 throws: 1.875e-05 Chance of rolling 13 in 5 throws: 0.00968 Chance of rolling 17 in 5 throws: 0.031295 Chance of rolling 23 in 5 throws: 0 Chance of rolling 31 in 5 throws: 0 Chance of rolling 43 in 5 throws: 0 Chance of rolling 59 in 5 throws: 0 Chance of rolling 91 in 5 throws: 0 Chance of rolling 119 in 5 throws: 0 Chance of rolling 1 in 10 throws: 0 Chance of rolling 5 in 10 throws: 0 Chance of rolling 7 in 10 throws: 0 Chance of rolling 13 in 10 throws: 1.69921875e-10 Chance of rolling 17 in 10 throws: 1.014837890625e-07 Chance of rolling 23 in 10 throws: 5.2156428125e-05 Chance of rolling 31 in 10 throws: 0.00234761235 Chance of rolling 43 in 10 throws: 0 Chance of rolling 59 in 10 throws: 0 Chance of rolling 91 in 10 throws: 0 Chance of rolling 119 in 10 throws: 0 Chance of rolling 1 in 25 throws: 0 Chance of rolling 5 in 25 throws: 0 Chance of rolling 7 in 25 throws: 0 Chance of rolling 13 in 25 throws: 0 Chance of rolling 17 in 25 throws: 0 Chance of rolling 23 in 25 throws: 0 Chance of rolling 31 in 25 throws: 1.08633041381835938e-25 Chance of rolling 43 in 25 throws: 7.92373889330642701e-17 Chance of rolling 59 in 25 throws: 9.08413811215993471e-10 Chance of rolling 91 in 25 throws: 1.16278186292830531e-07 Chance of rolling 119 in 25 throws: 0 Running time: 90 milli-seconds
    I don't have an explaination why the chance of rolling 119 in 25 throws is 0 - it calculates any roll above 101 for 25 throws to be 0.
      I think this code does not exactly what I need. I want to calculate the propability to roll x OR MORE. And @values shall contain the pips on each side of the die (so number of elements equals number of sides of the die) rather than the summed pips I want to reach.
        What stopped you from adapting the program I posted? Anyway:
        #!/usr/bin/perl use 5.010; use strict; use warnings; use List::Util 'sum'; my @dice = ([1 => 0.05], [5 => 0.1], [7 => 0.2], [13 => 0.3], [17 => + 0.35]); my %dice = map {$$_[0], $$_[1]} @dice; my $THROWS = 5; my %cache; sub chance; sub chance { my $throws = shift; my $target = shift; return $cache{$throws, $target} if defined $cache{$throws, $target +}; return $cache{$throws, $target} = 0 if $throws < 1 || $target < $t +hrows * $dice[0][0] || $target > $t +hrows * $dice[-1][0]; return $cache{$throws, $target} = $dice{$target} || 0 if $throws = += 1; return $cache{$throws, $target} = sum map {$dice{$_} * chance $throws-1, $target-$_} keys %d +ice; } my $start = sum times; my %total; my $MIN = $THROWS * $dice[0][0]; my $MAX = $THROWS * $dice[-1][0] + 1; foreach my $value ($MIN .. $MAX) { my $chance = chance $THROWS, $value; foreach my $value ($MIN .. $value) { $total{$value} += $chance; } } foreach my $value ($MIN .. $MAX) { say "Chance of rolling $value or higher: $total{$value}"; } my $end = sum times; say "Running time: ", 1000 * ($end - $start), " milli-seconds"; __END__ Chance of rolling 5 or higher: 1 Chance of rolling 6 or higher: 0.9999996875 Chance of rolling 7 or higher: 0.9999996875 Chance of rolling 8 or higher: 0.9999996875 Chance of rolling 9 or higher: 0.9999996875 Chance of rolling 10 or higher: 0.9999965625 Chance of rolling 11 or higher: 0.9999965625 Chance of rolling 12 or higher: 0.9999903125 Chance of rolling 13 or higher: 0.9999903125 Chance of rolling 14 or higher: 0.9999778125 Chance of rolling 15 or higher: 0.9999778125 Chance of rolling 16 or higher: 0.9999278125 Chance of rolling 17 or higher: 0.9999278125 Chance of rolling 18 or higher: 0.9998434375 Chance of rolling 19 or higher: 0.9998434375 Chance of rolling 20 or higher: 0.9996934375 Chance of rolling 21 or higher: 0.9996934375 Chance of rolling 22 or higher: 0.9992825 Chance of rolling 23 or higher: 0.9992825 Chance of rolling 24 or higher: 0.9987325 Chance of rolling 25 or higher: 0.9987325 Chance of rolling 26 or higher: 0.99781 Chance of rolling 27 or higher: 0.99781 Chance of rolling 28 or higher: 0.995835 Chance of rolling 29 or higher: 0.995835 Chance of rolling 30 or higher: 0.99346 Chance of rolling 31 or higher: 0.99346 Chance of rolling 32 or higher: 0.98981 Chance of rolling 33 or higher: 0.98981 Chance of rolling 34 or higher: 0.9829225 Chance of rolling 35 or higher: 0.9829225 Chance of rolling 36 or higher: 0.9755525 Chance of rolling 37 or higher: 0.9755525 Chance of rolling 38 or higher: 0.964499375 Chance of rolling 39 or higher: 0.964499375 Chance of rolling 40 or higher: 0.946949375 Chance of rolling 41 or higher: 0.946949375 Chance of rolling 42 or higher: 0.929305625 Chance of rolling 43 or higher: 0.929305625 Chance of rolling 44 or higher: 0.903868125 Chance of rolling 45 or higher: 0.903868125 Chance of rolling 46 or higher: 0.868668125 Chance of rolling 47 or higher: 0.868668125 Chance of rolling 48 or higher: 0.836118125 Chance of rolling 49 or higher: 0.836118125 Chance of rolling 50 or higher: 0.787436875 Chance of rolling 51 or higher: 0.787436875 Chance of rolling 52 or higher: 0.733586875 Chance of rolling 53 or higher: 0.733586875 Chance of rolling 54 or higher: 0.684515 Chance of rolling 55 or higher: 0.684515 Chance of rolling 56 or higher: 0.614865 Chance of rolling 57 or higher: 0.614865 Chance of rolling 58 or higher: 0.5482525 Chance of rolling 59 or higher: 0.5482525 Chance of rolling 60 or higher: 0.4874775 Chance of rolling 61 or higher: 0.4874775 Chance of rolling 62 or higher: 0.4036525 Chance of rolling 63 or higher: 0.4036525 Chance of rolling 64 or higher: 0.3487025 Chance of rolling 65 or higher: 0.3487025 Chance of rolling 66 or higher: 0.283185 Chance of rolling 67 or higher: 0.283185 Chance of rolling 68 or higher: 0.217035 Chance of rolling 69 or higher: 0.217035 Chance of rolling 70 or higher: 0.1733834375 Chance of rolling 71 or higher: 0.1733834375 Chance of rolling 72 or higher: 0.1219334375 Chance of rolling 73 or higher: 0.1219334375 Chance of rolling 74 or higher: 0.0813553125 Chance of rolling 75 or higher: 0.0813553125 Chance of rolling 76 or higher: 0.0663490625 Chance of rolling 77 or higher: 0.0663490625 Chance of rolling 78 or higher: 0.0277615625 Chance of rolling 79 or higher: 0.0277615625 Chance of rolling 80 or higher: 0.0277615625 Chance of rolling 81 or higher: 0.0277615625 Chance of rolling 82 or higher: 0.0052521875 Chance of rolling 83 or higher: 0.0052521875 Chance of rolling 84 or higher: 0.0052521875 Chance of rolling 85 or higher: 0.0052521875 Chance of rolling 86 or higher: 0 Running time: 20 milli-seconds

        Microcebus:

        If you want to approach it analytically:

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $num_dice = 5; my @probabilities = ( 0.05, 0.10, 0.20, 0.30, 0.35 ); my @values = ( 1, 5, 7, 13, 17 ); my $throw_die_ntimes = 5; my @possibilities = (); my $it = combo($num_dice, 0 .. $num_dice-1); while (my $t = &$it()) { $possibilities[ttl($t)] += prob($t); } my $cum=0; for my $ttl (0 .. $#possibilities) { next if ! defined $possibilities[$ttl]; $cum += $possibilities[$ttl]; printf "% 5u prob=%.8f, prob<=:%.8f, prob>:%.8f\n", $ttl, $possibilities[$ttl], $cum, 1-$cum; } sub prob { my $prob=1; $prob *= $probabilities[$_] for @{$_[0]}; return $prob; } sub ttl { my $total=0; $total += $values[$_] for @{$_[0]}; return $total; } sub combo { my ($num_dice, @list) = @_; my @position = ($#list) x $num_dice; my $done_fl = 0; # Since we pre-decrement, adjust so we return proper item on first + pass $position[0]++; return sub { return undef if $done_fl; my $cur = 0; { if (--$position[$cur] < 0) { # reset current dig & try next one $position[$cur] = $#list; $cur++; if ($cur > $num_dice) { $done_fl=1; return undef; } redo; } } return [ @list[@position] ]; } }

        When run, I get:

        $ perl 895892.pl 5 prob=0.00000031, prob<=:0.00000031, prob>:0.99999969 9 prob=0.00000313, prob<=:0.00000344, prob>:0.99999656 11 prob=0.00000625, prob<=:0.00000969, prob>:0.99999031 13 prob=0.00001250, prob<=:0.00002219, prob>:0.99997781 15 prob=0.00005000, prob<=:0.00007219, prob>:0.99992781 17 prob=0.00008438, prob<=:0.00015656, prob>:0.99984344 19 prob=0.00015000, prob<=:0.00030656, prob>:0.99969344 21 prob=0.00041094, prob<=:0.00071750, prob>:0.99928250 23 prob=0.00055000, prob<=:0.00126750, prob>:0.99873250 25 prob=0.00092250, prob<=:0.00219000, prob>:0.99781000 27 prob=0.00197500, prob<=:0.00416500, prob>:0.99583500 29 prob=0.00237500, prob<=:0.00654000, prob>:0.99346000 31 prob=0.00365000, prob<=:0.01019000, prob>:0.98981000 33 prob=0.00688750, prob<=:0.01707750, prob>:0.98292250 35 prob=0.00737000, prob<=:0.02444750, prob>:0.97555250 37 prob=0.01105313, prob<=:0.03550063, prob>:0.96449937 39 prob=0.01755000, prob<=:0.05305063, prob>:0.94694937 41 prob=0.01764375, prob<=:0.07069438, prob>:0.92930562 43 prob=0.02543750, prob<=:0.09613188, prob>:0.90386812 45 prob=0.03520000, prob<=:0.13133188, prob>:0.86866812 47 prob=0.03255000, prob<=:0.16388187, prob>:0.83611813 49 prob=0.04868125, prob<=:0.21256312, prob>:0.78743688 51 prob=0.05385000, prob<=:0.26641312, prob>:0.73358688 53 prob=0.04907188, prob<=:0.31548500, prob>:0.68451500 55 prob=0.06965000, prob<=:0.38513500, prob>:0.61486500 57 prob=0.06661250, prob<=:0.45174750, prob>:0.54825250 59 prob=0.06077500, prob<=:0.51252250, prob>:0.48747750 61 prob=0.08382500, prob<=:0.59634750, prob>:0.40365250 63 prob=0.05495000, prob<=:0.65129750, prob>:0.34870250 65 prob=0.06551750, prob<=:0.71681500, prob>:0.28318500 67 prob=0.06615000, prob<=:0.78296500, prob>:0.21703500 69 prob=0.04365156, prob<=:0.82661656, prob>:0.17338344 71 prob=0.05145000, prob<=:0.87806656, prob>:0.12193344 73 prob=0.04057812, prob<=:0.91864469, prob>:0.08135531 75 prob=0.01500625, prob<=:0.93365094, prob>:0.06634906 77 prob=0.03858750, prob<=:0.97223844, prob>:0.02776156 81 prob=0.02250937, prob<=:0.99474781, prob>:0.00525219 85 prob=0.00525219, prob<=:1.00000000, prob>:0.00000000

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Loaded die
by ELISHEVA (Prior) on Mar 28, 2011 at 16:23 UTC

    If you are trying to calculate the weighted probability of a gene cluster, then I think you really need to bone up on statistics and also numerical methods (to help you translate the statistical formulas and distributions to their programming equivalents). Most of these problems can be solved with fairly standard formulas and a bit of integral calculus.

    Depending on why you are studying clusters, you may also want to look at things like monte carlo analysis. A quick google shows me that this is also being used to rule out random chance in gene cluster studies, e.g. Evaluation of the relationship between interleukin-1 gene cluster polymorphisms and early implant failure in non-smoking patients. That study uses monte carlo analysis to help analyze whether differences in gene cluster frequencies between two populations is due to more than random chance.

    I realize this may be more math than you bargained for. However, you are never going to solve these problems for anything but trivial numbers of die throws by generating all permutations and calculating the probability one permutation at a time.