good chemistry is complicated,and a little bit messy -LW PerlMonks

 on Mar 28, 2011 at 12:36 UTC Need Help??

Microcebus has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I wrote a script to calculate the propability to exceed a defined pips sum throwing a loaded die with x sides n times. The number of pips for each side can also be defined.
```\$throw_die_ntimes=5;
\$min_summed_pips=60;
\$sides_of_die=5;
# propability to get a certain side
@propabilities=(0.05,0.1,0.2,0.3,0.35);
# number of pips for each side
@values=(1,5,7,13,17);
# generate possible compositions
@start_array=("1","2","3","4","5");
foreach(1..\$throw_die_ntimes-1)
{
%hash=();
foreach\$composition(@start_array)
{
foreach(1..\$sides_of_die)
{
\$new_composition=\$composition.\$_;
@sort_new_composition=split('',\$new_composition);
@sort_new_composition=sort@sort_new_composition;
\$new_composition=join('',@sort_new_composition);
push(@new_array,\$new_composition);
\$hash{\$new_composition}++;
}
}
@start_array=@new_array;
@new_array=();
}
# calculate propability of each composition
\$total_prop_to_exceed_minimum=0;
\$total_prop_to_deceed_minimum=0;
foreach\$composition(keys%hash)
{
\$propability_of_composition=1;
\$value_of_composition=0;
@composition=split('',\$composition);
foreach\$step(@composition)
{
\$propability_of_composition=\$propability_of_composition*\$propa
+bilities[\$step-1];
\$value_of_composition+=\$values[\$step-1];
}
\$propability_of_composition=\$propability_of_composition*\$hash{\$com
+position};
# add propability to one of two total propabilities (exceed or dec
+eed minimum)
if(\$value_of_composition>=\$min_summed_pips)
{\$total_prop_to_exceed_minimum+=\$propability_of_composition;}
else
{\$total_prop_to_deceed_minimum+=\$propability_of_composition;}
}

print"p to exceed minimum: \$total_prop_to_exceed_minimum\np to deceed
+minimum: \$total_prop_to_deceed_minimum\n\n";
system("pause");
exit;
This code works but gets very slow when I throw the die more then 10 times. The problem seems to be the exploding array when calculating all possible compositions. I pored over a smarter way to do this but still I do not have a clue.
Any ideas?

Replies are listed 'Best First'.
by jethro (Monsignor) on Mar 28, 2011 at 13:00 UTC

I can think of two faster ways:

1) If you don't "trust" math, you just run a few thousand virtual test runs with your dice and note the average result. This will be accuracte enough for almost any use case, for example to calculate your average income with these dice ;-)

2) Use math. A good book on stochastics should have the formulas that tell you the exact number if you give the average result of the die (at least if the values are not too different and smaller than the target sum). I think the relevant calculation you need has to do with a gaussian curve.

by ELISHEVA (Prior) on Mar 28, 2011 at 13:09 UTC

Well, for starters in might have something to do with the way you are creating your hash %hash. Each time you cycle through the outer loop @start_array grows in size (by a factor of 5) so each throw of the die results in every more iterations in the inner two loops. Here is what the count looks like:

```throw of die: 1:
iterations in inner two loops: 25
total iterations: 25

throw of die: 2:
iterations in inner two loops: 125
total iterations: 150

throw of die: 3:
iterations in inner two loops: 625
total iterations: 775

throw of die: 4:
iterations in inner two loops: 3125
total iterations: 3900

throw of die: 5:
iterations in inner two loops: 15625
total iterations: 19525

throw of die: 6:
iterations in inner two loops: 78125
total iterations: 97650

throw of die: 7:
iterations in inner two loops: 390625
total iterations: 488275

throw of die: 8:
iterations in inner two loops: 1953125
total iterations: 2441400
...

Did you mean for @start_array to be growing in size this way? By dice throw 10, @start_array will have 5 to the 10th power number of elements and the 10th throw will require 5*11 iterations to complete.

Yes, the growing size of @start_array is the problem. The array contains the same compositions many times (one time for each possible chronology) which I count when I create the hash. I think there must be a smarter way...
by BrowserUk (Patriarch) on Mar 28, 2011 at 14:09 UTC

Rather than brute force iteration, you could always simulate:

```#! perl -slw
use strict;
use Data::Dump qw[ pp ];
use List::Util qw[ sum ];
use Math::Random::MT qw[ rand ];

my \$throws   = 5;
my \$sides    = 5;

my @probs = ( 0.05, 0.1, 0.2, 0.3, 0.35 );
my @values = ( 1, 5, 7, 13, 17 );
my @lookup = map{
( \$values[ \$_ ] ) x ( \$probs[ \$_ ] * 100 );
} 0 .. \$#values;

our \$MIN  //= 60;
our \$RUNS //= 1e5;

my \$moreThanMin = 0;
for ( 1 .. \$RUNS ) {
my \$total = sum map{ \$lookup[ rand( scalar @lookup ) ] } 1 .. \$thr
+ows;
++\$moreThanMin if \$total > \$MIN;
}
printf "Probability based on \$RUNS simulations: %.3f%%",
\$moreThanMin / \$RUNS * 100;

__END__

c:\test>895892 -RUNS=1e6
Probability based on 1e6 simulations: 48.752%

It won't save time on small die, but on the larger ones you can decide how much time you want to spend to get closer and closer to the answer.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
You are right. I thought about this option but in my special case, I have to calculate possibilities like 1/10^100. Hence I would have to do more than 10^100 simulations...
I have to calculate possibilities like 1/10^100.

Hm. Odds that small are so close to zero as to be meaningless.

There are estimated to be 10^80 atoms in the universe. If one of them had gone missing (or an extra one had materialised into existance) at some point after matter crystallised out of the big bang, the affect the absence or presence of that one atom would have had on the form of the universe would be undetectable at any scale.

And the affect that a 1/10^100 change in probability will have (on anything) is 100 million trillion times less significant.

To try and put that into perspective. If a casino dice had 1/10^100 bias, it would undetectable by any mechanism. I would guess that every single dice ever produced has been, is, and always will be biased billions of times more significantly. A single missing electron from the face of the dice will bias it far more significantly than that.

Indeed, the bias of every dice ever made probably varies by bllions of times more than that, second by second, due to moisture absorption, evaporation or radioactive decay.

If every human being that ever existed, threw every dice that ever existed, once every microsecond, for their entire lives and you managed to record every single throw, you would not be able to detect if such a rare event had occurred or not.

So what is the point of knowing such probabilities? Even knowing that they are not zero is useless information.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
by JavaFan (Canon) on Mar 28, 2011 at 13:26 UTC
Cache the partial results:
```use 5.010;

use strict;
use warnings;

use List::Util 'sum';

my @props = (0.05, 0.1, 0.2, 0.3);
push @props, 1 - sum @props;     # Make sure we sum to 1.

my @THROWS = (5, 10, 25);
my @values = (1, 5, 7, 13, 17, 23, 31, 43, 59, 91, 119);

my %cache;
sub chance;
sub chance {
my \$throws = shift;
my \$target = shift;

return \$cache{\$throws, \$target} if defined \$cache{\$throws, \$target
+};

return \$cache{\$throws, \$target} = 0 if \$throws < 1 || \$target < \$t
+hrows;
if (\$throws == 1) {
return \$cache{\$throws, \$target} = 0 if \$target > @props;
return \$cache{\$throws, \$target} = \$props[\$target-1];
}
return \$cache{\$throws, \$target} =
sum map {\$props[\$_-1] * chance \$throws-1, \$target-\$_} 1 ..
+ \$#props;
}

my \$start = sum times;

foreach my \$throws (@THROWS) {
foreach my \$value (@values) {
say "Chance of hitting \$value in \$throws throws: ", chance \$th
+rows, \$value;
}
}

my \$end = sum times;
say "Running time: ", 1000 * (\$end - \$start), " milli-seconds";

__END__
Chance of rolling 1 in 5 throws: 0
Chance of rolling 5 in 5 throws: 3.125e-07
Chance of rolling 7 in 5 throws: 1.875e-05
Chance of rolling 13 in 5 throws: 0.00968
Chance of rolling 17 in 5 throws: 0.031295
Chance of rolling 23 in 5 throws: 0
Chance of rolling 31 in 5 throws: 0
Chance of rolling 43 in 5 throws: 0
Chance of rolling 59 in 5 throws: 0
Chance of rolling 91 in 5 throws: 0
Chance of rolling 119 in 5 throws: 0
Chance of rolling 1 in 10 throws: 0
Chance of rolling 5 in 10 throws: 0
Chance of rolling 7 in 10 throws: 0
Chance of rolling 13 in 10 throws: 1.69921875e-10
Chance of rolling 17 in 10 throws: 1.014837890625e-07
Chance of rolling 23 in 10 throws: 5.2156428125e-05
Chance of rolling 31 in 10 throws: 0.00234761235
Chance of rolling 43 in 10 throws: 0
Chance of rolling 59 in 10 throws: 0
Chance of rolling 91 in 10 throws: 0
Chance of rolling 119 in 10 throws: 0
Chance of rolling 1 in 25 throws: 0
Chance of rolling 5 in 25 throws: 0
Chance of rolling 7 in 25 throws: 0
Chance of rolling 13 in 25 throws: 0
Chance of rolling 17 in 25 throws: 0
Chance of rolling 23 in 25 throws: 0
Chance of rolling 31 in 25 throws: 1.08633041381835938e-25
Chance of rolling 43 in 25 throws: 7.92373889330642701e-17
Chance of rolling 59 in 25 throws: 9.08413811215993471e-10
Chance of rolling 91 in 25 throws: 1.16278186292830531e-07
Chance of rolling 119 in 25 throws: 0
Running time: 90 milli-seconds
I don't have an explaination why the chance of rolling 119 in 25 throws is 0 - it calculates any roll above 101 for 25 throws to be 0.
I think this code does not exactly what I need. I want to calculate the propability to roll x OR MORE. And @values shall contain the pips on each side of the die (so number of elements equals number of sides of the die) rather than the summed pips I want to reach.
What stopped you from adapting the program I posted? Anyway:
```#!/usr/bin/perl

use 5.010;

use strict;
use warnings;

use List::Util 'sum';

my @dice   = ([1 => 0.05], [5 => 0.1], [7 => 0.2], [13 => 0.3], [17 =>
+ 0.35]);
my %dice   = map {\$\$_[0], \$\$_[1]} @dice;
my \$THROWS = 5;

my %cache;
sub chance;
sub chance {
my \$throws = shift;
my \$target = shift;

return \$cache{\$throws, \$target} if defined \$cache{\$throws, \$target
+};
return \$cache{\$throws, \$target} = 0 if \$throws < 1 || \$target < \$t
+hrows * \$dice[0][0] ||
\$target > \$t
+hrows * \$dice[-1][0];
return \$cache{\$throws, \$target} = \$dice{\$target} || 0 if \$throws =
+= 1;
return \$cache{\$throws, \$target} =
sum map {\$dice{\$_} * chance \$throws-1, \$target-\$_} keys %d
+ice;
}

my \$start = sum times;

my %total;

my \$MIN = \$THROWS * \$dice[0][0];
my \$MAX = \$THROWS * \$dice[-1][0] + 1;

foreach my \$value (\$MIN .. \$MAX) {
my \$chance = chance \$THROWS, \$value;
foreach my \$value (\$MIN .. \$value) {
\$total{\$value} += \$chance;
}
}

foreach my \$value (\$MIN .. \$MAX) {
say "Chance of rolling \$value or higher: \$total{\$value}";
}

my \$end = sum times;
say "Running time: ", 1000 * (\$end - \$start), " milli-seconds";

__END__
Chance of rolling 5 or higher: 1
Chance of rolling 6 or higher: 0.9999996875
Chance of rolling 7 or higher: 0.9999996875
Chance of rolling 8 or higher: 0.9999996875
Chance of rolling 9 or higher: 0.9999996875
Chance of rolling 10 or higher: 0.9999965625
Chance of rolling 11 or higher: 0.9999965625
Chance of rolling 12 or higher: 0.9999903125
Chance of rolling 13 or higher: 0.9999903125
Chance of rolling 14 or higher: 0.9999778125
Chance of rolling 15 or higher: 0.9999778125
Chance of rolling 16 or higher: 0.9999278125
Chance of rolling 17 or higher: 0.9999278125
Chance of rolling 18 or higher: 0.9998434375
Chance of rolling 19 or higher: 0.9998434375
Chance of rolling 20 or higher: 0.9996934375
Chance of rolling 21 or higher: 0.9996934375
Chance of rolling 22 or higher: 0.9992825
Chance of rolling 23 or higher: 0.9992825
Chance of rolling 24 or higher: 0.9987325
Chance of rolling 25 or higher: 0.9987325
Chance of rolling 26 or higher: 0.99781
Chance of rolling 27 or higher: 0.99781
Chance of rolling 28 or higher: 0.995835
Chance of rolling 29 or higher: 0.995835
Chance of rolling 30 or higher: 0.99346
Chance of rolling 31 or higher: 0.99346
Chance of rolling 32 or higher: 0.98981
Chance of rolling 33 or higher: 0.98981
Chance of rolling 34 or higher: 0.9829225
Chance of rolling 35 or higher: 0.9829225
Chance of rolling 36 or higher: 0.9755525
Chance of rolling 37 or higher: 0.9755525
Chance of rolling 38 or higher: 0.964499375
Chance of rolling 39 or higher: 0.964499375
Chance of rolling 40 or higher: 0.946949375
Chance of rolling 41 or higher: 0.946949375
Chance of rolling 42 or higher: 0.929305625
Chance of rolling 43 or higher: 0.929305625
Chance of rolling 44 or higher: 0.903868125
Chance of rolling 45 or higher: 0.903868125
Chance of rolling 46 or higher: 0.868668125
Chance of rolling 47 or higher: 0.868668125
Chance of rolling 48 or higher: 0.836118125
Chance of rolling 49 or higher: 0.836118125
Chance of rolling 50 or higher: 0.787436875
Chance of rolling 51 or higher: 0.787436875
Chance of rolling 52 or higher: 0.733586875
Chance of rolling 53 or higher: 0.733586875
Chance of rolling 54 or higher: 0.684515
Chance of rolling 55 or higher: 0.684515
Chance of rolling 56 or higher: 0.614865
Chance of rolling 57 or higher: 0.614865
Chance of rolling 58 or higher: 0.5482525
Chance of rolling 59 or higher: 0.5482525
Chance of rolling 60 or higher: 0.4874775
Chance of rolling 61 or higher: 0.4874775
Chance of rolling 62 or higher: 0.4036525
Chance of rolling 63 or higher: 0.4036525
Chance of rolling 64 or higher: 0.3487025
Chance of rolling 65 or higher: 0.3487025
Chance of rolling 66 or higher: 0.283185
Chance of rolling 67 or higher: 0.283185
Chance of rolling 68 or higher: 0.217035
Chance of rolling 69 or higher: 0.217035
Chance of rolling 70 or higher: 0.1733834375
Chance of rolling 71 or higher: 0.1733834375
Chance of rolling 72 or higher: 0.1219334375
Chance of rolling 73 or higher: 0.1219334375
Chance of rolling 74 or higher: 0.0813553125
Chance of rolling 75 or higher: 0.0813553125
Chance of rolling 76 or higher: 0.0663490625
Chance of rolling 77 or higher: 0.0663490625
Chance of rolling 78 or higher: 0.0277615625
Chance of rolling 79 or higher: 0.0277615625
Chance of rolling 80 or higher: 0.0277615625
Chance of rolling 81 or higher: 0.0277615625
Chance of rolling 82 or higher: 0.0052521875
Chance of rolling 83 or higher: 0.0052521875
Chance of rolling 84 or higher: 0.0052521875
Chance of rolling 85 or higher: 0.0052521875
Chance of rolling 86 or higher: 0
Running time: 20 milli-seconds

If you want to approach it analytically:

```#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my \$num_dice = 5;
my @probabilities = ( 0.05, 0.10, 0.20, 0.30, 0.35 );
my @values        = ( 1,    5,    7,    13,   17   );
my \$throw_die_ntimes = 5;
my @possibilities = ();

my \$it = combo(\$num_dice, 0 .. \$num_dice-1);
while (my \$t = &\$it()) {
\$possibilities[ttl(\$t)] += prob(\$t);
}

my \$cum=0;
for my \$ttl (0 .. \$#possibilities) {
next if ! defined \$possibilities[\$ttl];
\$cum += \$possibilities[\$ttl];
printf "% 5u prob=%.8f,  prob<=:%.8f, prob>:%.8f\n",
\$ttl, \$possibilities[\$ttl], \$cum, 1-\$cum;
}

sub prob {
my \$prob=1;
\$prob *= \$probabilities[\$_] for @{\$_[0]};
return \$prob;
}

sub ttl {
my \$total=0;
\$total += \$values[\$_] for @{\$_[0]};
return \$total;
}
sub combo {
my (\$num_dice, @list) = @_;
my @position = (\$#list) x \$num_dice;
my \$done_fl = 0;

# Since we pre-decrement, adjust so we return proper item on first
+ pass
\$position[0]++;

return sub {
return undef if \$done_fl;

my \$cur = 0;
{
if (--\$position[\$cur] < 0) {
# reset current dig & try next one
\$position[\$cur] = \$#list;
\$cur++;
if (\$cur > \$num_dice) {
\$done_fl=1;
return undef;
}
redo;
}
}

return [ @list[@position] ];
}
}

When run, I get:

```\$ perl 895892.pl
5 prob=0.00000031,  prob<=:0.00000031, prob>:0.99999969
9 prob=0.00000313,  prob<=:0.00000344, prob>:0.99999656
11 prob=0.00000625,  prob<=:0.00000969, prob>:0.99999031
13 prob=0.00001250,  prob<=:0.00002219, prob>:0.99997781
15 prob=0.00005000,  prob<=:0.00007219, prob>:0.99992781
17 prob=0.00008438,  prob<=:0.00015656, prob>:0.99984344
19 prob=0.00015000,  prob<=:0.00030656, prob>:0.99969344
21 prob=0.00041094,  prob<=:0.00071750, prob>:0.99928250
23 prob=0.00055000,  prob<=:0.00126750, prob>:0.99873250
25 prob=0.00092250,  prob<=:0.00219000, prob>:0.99781000
27 prob=0.00197500,  prob<=:0.00416500, prob>:0.99583500
29 prob=0.00237500,  prob<=:0.00654000, prob>:0.99346000
31 prob=0.00365000,  prob<=:0.01019000, prob>:0.98981000
33 prob=0.00688750,  prob<=:0.01707750, prob>:0.98292250
35 prob=0.00737000,  prob<=:0.02444750, prob>:0.97555250
37 prob=0.01105313,  prob<=:0.03550063, prob>:0.96449937
39 prob=0.01755000,  prob<=:0.05305063, prob>:0.94694937
41 prob=0.01764375,  prob<=:0.07069438, prob>:0.92930562
43 prob=0.02543750,  prob<=:0.09613188, prob>:0.90386812
45 prob=0.03520000,  prob<=:0.13133188, prob>:0.86866812
47 prob=0.03255000,  prob<=:0.16388187, prob>:0.83611813
49 prob=0.04868125,  prob<=:0.21256312, prob>:0.78743688
51 prob=0.05385000,  prob<=:0.26641312, prob>:0.73358688
53 prob=0.04907188,  prob<=:0.31548500, prob>:0.68451500
55 prob=0.06965000,  prob<=:0.38513500, prob>:0.61486500
57 prob=0.06661250,  prob<=:0.45174750, prob>:0.54825250
59 prob=0.06077500,  prob<=:0.51252250, prob>:0.48747750
61 prob=0.08382500,  prob<=:0.59634750, prob>:0.40365250
63 prob=0.05495000,  prob<=:0.65129750, prob>:0.34870250
65 prob=0.06551750,  prob<=:0.71681500, prob>:0.28318500
67 prob=0.06615000,  prob<=:0.78296500, prob>:0.21703500
69 prob=0.04365156,  prob<=:0.82661656, prob>:0.17338344
71 prob=0.05145000,  prob<=:0.87806656, prob>:0.12193344
73 prob=0.04057812,  prob<=:0.91864469, prob>:0.08135531
75 prob=0.01500625,  prob<=:0.93365094, prob>:0.06634906
77 prob=0.03858750,  prob<=:0.97223844, prob>:0.02776156
81 prob=0.02250937,  prob<=:0.99474781, prob>:0.00525219
85 prob=0.00525219,  prob<=:1.00000000, prob>:0.00000000

...roboticus

When your only tool is a hammer, all problems look like your thumb.

by ELISHEVA (Prior) on Mar 28, 2011 at 16:23 UTC

If you are trying to calculate the weighted probability of a gene cluster, then I think you really need to bone up on statistics and also numerical methods (to help you translate the statistical formulas and distributions to their programming equivalents). Most of these problems can be solved with fairly standard formulas and a bit of integral calculus.

Depending on why you are studying clusters, you may also want to look at things like monte carlo analysis. A quick google shows me that this is also being used to rule out random chance in gene cluster studies, e.g. Evaluation of the relationship between interleukin-1 gene cluster polymorphisms and early implant failure in non-smoking patients. That study uses monte carlo analysis to help analyze whether differences in gene cluster frequencies between two populations is due to more than random chance.

I realize this may be more math than you bargained for. However, you are never going to solve these problems for anything but trivial numbers of die throws by generating all permutations and calculating the probability one permutation at a time.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://895892]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2023-05-31 03:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?