this exhibits a pretty strong bias toward the lower numbers generated, and against the higher numbers.
Yes. A side-effect of my lazy way of attempting to ensure that at least 20 numbers are produced each time.
With 400 inputs to choose from the selector value should be 0.05 not 0.075; but the nature of random is that whilst 0.05 produces a fair pick:
[undef, 999508, 999959, 1000278, 1002083, 999969, 1001388, 1002007, 99
+9127, 1000314, 1001289, 1000014, 999255, 1000929, 1001682, 1000862, 9
+98954, 1002277, 999569, 1000337, 999569]
It will on occasion produce as many as 44 values or as few as 3: [undef, undef, undef, 2, 7, 33, 157, 398, 1041, 2437, 5333, 9541, 1650
+8, 25675, 37569, 50824, 64506, 76623, 85656, 90711, 91374,
86560, 78584, 68077, 56808, 44695, 33849, 24855, 17439, 11775, 7601,
+4708, 2839, 1690, 981, 568, 290, 137, 74, 41, 12, 9, 11, 1, 1],
)
By raising the selector value to 0.075, I made it far more likely that it would produce at least 20 values. The list slice ensures that it is not more than 20; but also introduces the bias by always throwing away the higher values when it overproduces.
The following results from the above code with the only change 0.05 => 0.075; demonstrates that the distribution of the range is still very fair; but on average 50% more numbers are produced each time before the slice operation trims the numbers back, (and introduces the bias). It also shows that the probability of under-producing is greatly lessened: [undef, 1495533, 1499609, 1498974, 1499522, 1501930, 1501314, 1499981,
+ 1501222, 1499646, 1500600, 1500068, 1500915, 1498017, 1500384, 15010
+31, 1498257, 1500431, 1501058, 1498359, 1500716]
[undef, undef, undef, undef, undef, undef, undef, undef, undef, 2, 12,
+ 31, 75, 170, 373, 768, 1568, 2718, 4802, 7795, 12096, 17806, 24879,
+32813, 41477, 51438, 59539, 66763, 72668,
74986, 75575, 73039, 68515, 61653, 53917, 46112, 37751, 30042, 23138,
+17536, 12842, 9196, 6284, 4298, 2803, 1748, 1053, 721, 429, 253, 156,
+ 80, 43, 17, 8, 8, 2, undef, 1, undef, 1],
This could be fixed by repeating the process until exactly 20 numbers come out, which ensure the fairness: #! perl -slw
use strict;
use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 300;
my( @counts, @ns );
for( 1 .. 1e6 ) {
my @orderedRands = grep{ rand(1) < 0.05 } map{ ($_) x 20 } 1 .. 20
+;
while( @orderedRands != 20 ) {
@orderedRands = grep{ rand(1) < 0.05 } map{ ($_) x 20 } 1 .. 2
+0;
}
++$ns[ @orderedRands ];
++$counts[ $_ ] for @orderedRands;
}
pp \@counts, \@ns;
__END__
C:\test>junk62
(
[undef, 1000652, 999987, 1000022, 999969, 999146, 1000961, 1000568,
+1000129, 1000725, 999884, 999509, 999756, 1000538, 999763, 1000708, 1
+000826, 999799, 998778, 998714, 999566],
[undef, undef, undef, undef, undef, undef, undef, undef, undef, unde
+f, undef, undef, undef, undef, undef, undef, undef, undef, undef, und
+ef, 1000000],
)
Of course that is far more expensive than doing the sort that it avoids.
But then, my post was nothing more than a semi-humorous response to a question that itself is something of a joke.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
|