Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: random #s

by davido (Cardinal)
on Oct 07, 2016 at 17:30 UTC ( [id://1173499]=note: print w/replies, xml ) Need Help??


in reply to Re: random #s
in thread random #s

I think this exhibits a pretty strong bias toward the lower numbers generated, and against the higher numbers. It may be pseudo random but the distribution is not close to flat.

use List::Util qw(sum); my @count = (0) x 20; for (1 .. 200000) { my @orderedRands = (grep{ rand(1) < 0.075 } map{ ($_) x 20 } 1 .. +20 )[0..19]; $count[$_]++ for grep {defined} @orderedRands; } my $sum = sum(@count); print "$_\t=>\t$count[$_]\t(", sprintf('%.2f', ($count[$_]/$sum)*100), + "% TTL)\n" for 1 .. 20;

...which produces...

1 => 300463 (7.53% TTL) 2 => 300337 (7.52% TTL) 3 => 300667 (7.53% TTL) 4 => 300333 (7.52% TTL) 5 => 300725 (7.53% TTL) 6 => 299081 (7.49% TTL) 7 => 299696 (7.51% TTL) 8 => 297134 (7.44% TTL) 9 => 289983 (7.26% TTL) 10 => 276352 (6.92% TTL) 11 => 252348 (6.32% TTL) 12 => 217016 (5.44% TTL) 13 => 175867 (4.41% TTL) 14 => 133708 (3.35% TTL) 15 => 96010 (2.40% TTL) 16 => 65012 (1.63% TTL) 17 => 41130 (1.03% TTL) 18 => 24485 (0.61% TTL) 19 => 14228 (0.36% TTL) 20 => 7685 (0.19% TTL)

...and that is with adding the filtering-out of undefined values that found their way into @orderedRands.

I hope my adaptation of your algorithm didn't miss an important nuance.


Dave

Replies are listed 'Best First'.
Re^3: random #s
by BrowserUk (Patriarch) on Oct 07, 2016 at 21:04 UTC
    this exhibits a pretty strong bias toward the lower numbers generated, and against the higher numbers.

    Yes. A side-effect of my lazy way of attempting to ensure that at least 20 numbers are produced each time.

    With 400 inputs to choose from the selector value should be 0.05 not 0.075; but the nature of random is that whilst 0.05 produces a fair pick:

    [undef, 999508, 999959, 1000278, 1002083, 999969, 1001388, 1002007, 99 +9127, 1000314, 1001289, 1000014, 999255, 1000929, 1001682, 1000862, 9 +98954, 1002277, 999569, 1000337, 999569]

    It will on occasion produce as many as 44 values or as few as 3:

    [undef, undef, undef, 2, 7, 33, 157, 398, 1041, 2437, 5333, 9541, 1650 +8, 25675, 37569, 50824, 64506, 76623, 85656, 90711, 91374, 86560, 78584, 68077, 56808, 44695, 33849, 24855, 17439, 11775, 7601, +4708, 2839, 1690, 981, 568, 290, 137, 74, 41, 12, 9, 11, 1, 1], )

    By raising the selector value to 0.075, I made it far more likely that it would produce at least 20 values. The list slice ensures that it is not more than 20; but also introduces the bias by always throwing away the higher values when it overproduces.

    The following results from the above code with the only change 0.05 => 0.075; demonstrates that the distribution of the range is still very fair; but on average 50% more numbers are produced each time before the slice operation trims the numbers back, (and introduces the bias). It also shows that the probability of under-producing is greatly lessened:

    [undef, 1495533, 1499609, 1498974, 1499522, 1501930, 1501314, 1499981, + 1501222, 1499646, 1500600, 1500068, 1500915, 1498017, 1500384, 15010 +31, 1498257, 1500431, 1501058, 1498359, 1500716] [undef, undef, undef, undef, undef, undef, undef, undef, undef, 2, 12, + 31, 75, 170, 373, 768, 1568, 2718, 4802, 7795, 12096, 17806, 24879, +32813, 41477, 51438, 59539, 66763, 72668, 74986, 75575, 73039, 68515, 61653, 53917, 46112, 37751, 30042, 23138, +17536, 12842, 9196, 6284, 4298, 2803, 1748, 1053, 721, 429, 253, 156, + 80, 43, 17, 8, 8, 2, undef, 1, undef, 1],

    This could be fixed by repeating the process until exactly 20 numbers come out, which ensure the fairness:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 300; my( @counts, @ns ); for( 1 .. 1e6 ) { my @orderedRands = grep{ rand(1) < 0.05 } map{ ($_) x 20 } 1 .. 20 +; while( @orderedRands != 20 ) { @orderedRands = grep{ rand(1) < 0.05 } map{ ($_) x 20 } 1 .. 2 +0; } ++$ns[ @orderedRands ]; ++$counts[ $_ ] for @orderedRands; } pp \@counts, \@ns; __END__ C:\test>junk62 ( [undef, 1000652, 999987, 1000022, 999969, 999146, 1000961, 1000568, +1000129, 1000725, 999884, 999509, 999756, 1000538, 999763, 1000708, 1 +000826, 999799, 998778, 998714, 999566], [undef, undef, undef, undef, undef, undef, undef, undef, undef, unde +f, undef, undef, undef, undef, undef, undef, undef, undef, undef, und +ef, 1000000], )

    Of course that is far more expensive than doing the sort that it avoids.

    But then, my post was nothing more than a semi-humorous response to a question that itself is something of a joke.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: random #s
by tybalt89 (Monsignor) on Oct 07, 2016 at 19:45 UTC

    The algorithm needs a minor tweak:

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1173445 use strict; use warnings; use List::Util qw(sum); my @count = (0) x 20; for (1 .. 2e4) { my ($top, $bottom) = (20, 20 * 20); my @orderedRands = grep{ $bottom > 0 && rand(1) < $top / $bottom ? (--$top, --$bottom, 1) : (--$bottom, 0) } map{ ($_) x 20 } 1 .. 20 ; $count[$_]++ for grep {defined} @orderedRands; } my $sum = sum(@count); print "$_\t=>\t$count[$_]\t(", sprintf('%.2f', ($count[$_]/$sum)*100), + "% TTL)\n" for 1 .. 20;

    ...which produces...

    1 => 20031 (5.01% TTL) 2 => 20037 (5.01% TTL) 3 => 19831 (4.96% TTL) 4 => 19953 (4.99% TTL) 5 => 19840 (4.96% TTL) 6 => 20068 (5.02% TTL) 7 => 19868 (4.97% TTL) 8 => 20030 (5.01% TTL) 9 => 19790 (4.95% TTL) 10 => 20076 (5.02% TTL) 11 => 19944 (4.99% TTL) 12 => 19909 (4.98% TTL) 13 => 20049 (5.01% TTL) 14 => 20150 (5.04% TTL) 15 => 19940 (4.98% TTL) 16 => 20112 (5.03% TTL) 17 => 19973 (4.99% TTL) 18 => 20157 (5.04% TTL) 19 => 20088 (5.02% TTL) 20 => 20154 (5.04% TTL)

    Which looks pretty good.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1173499]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2024-04-18 10:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found