http://www.perlmonks.org?node_id=298618

The NRC Handelsblad this Saturday had an article about pseudo-random number sequences. They mentioned the work of Heiko Bauke and Stephan Mertens: Entropy of Pseudo Random Numbers. According to the article in the NRC, it seems that many random number generators tend to generate longer sequences of zero's than you might want or expect. The article also claims that Bauke and Mertens indicate that by removing the 0's from random number sequences, you improve the quality of the random number sequence. I've tried to make heads or tails from the original article by Bauke and Mertens, but it is waaay out of my league. So I hope the assumptions of the NRC article are correct.

Anyway, I decided to see whether the standard Perl random number generator would suffer from this problem, and created the following program:

use strict; my @freq; my @same; use constant MAX => 10; my $running = 1; $SIG{INT} = sub { $running = 0 }; # keep running until Control-C'd while ($running) { my $value = int rand MAX; $freq[$value]++; substr( $_,0,0 ) = $value; s#^((.)\2*)(.*)#$1#s; next unless length( $3 ) > 1; $same[substr $3,0,1]->{length $3}++; } my $total; foreach (0..MAX-1) { $total += $freq[$_]; my $ref = $same[$_]; print qq{\n$_($freq[$_]): @{[map {"$_($ref->{$_}x)"} sort {$a <=> +$b} keys %{$ref}]}}; } print "\n$total total\n";

The program basically draws random numbers and puts them into a string. A regular expression is used to find out whether the number was the same as the previous time. If not, it will score the number of times the previously drawn random number was drawn consecutively.

The program will run until interrupted with a Control-C. Then it will display the result.

I let this run on my iBook last night (I guess in total for about 9 hours or so) and came up with the following results.

N frequency consecutives(times x) 0(178763622): 2(14480890x) 3(1449710x) 4(144353x) 5(14618x) 6(1459x) 7 +(146x) 8(19x) 9(1x) 1(178752652): 2(14483383x) 3(1447847x) 4(144631x) 5(14362x) 6(1418x) 7 +(154x) 8(14x) 9(1x) 2(178739112): 2(14480973x) 3(1447340x) 4(144844x) 5(14428x) 6(1492x) 7 +(145x) 8(18x) 9(1x) 3(178744059): 2(14484560x) 3(1448751x) 4(145191x) 5(14589x) 6(1476x) 7 +(140x) 8(13x) 9(4x) 4(178763609): 2(14475608x) 3(1447185x) 4(144794x) 5(14502x) 6(1462x) 7 +(149x) 8(14x) 5(178757189): 2(14476120x) 3(1448546x) 4(144803x) 5(14335x) 6(1373x) 7 +(135x) 8(21x) 9(1x) 6(178784085): 2(14488900x) 3(1446212x) 4(144556x) 5(14308x) 6(1454x) 7 +(165x) 8(15x) 9(3x) 7(178749595): 2(14475124x) 3(1448746x) 4(143875x) 5(14586x) 6(1464x) 7 +(158x) 8(7x) 9(3x) 8(178766722): 2(14483441x) 3(1448498x) 4(144450x) 5(14594x) 6(1448x) 7 +(150x) 8(18x) 9(1x) 9(178775354): 2(14481938x) 3(1446327x) 4(145687x) 5(14591x) 6(1480x) 7 +(157x) 8(10x) 9(2x) 1787595999 total
Looking at these numbers, I would think that either:
  1. Perl doesn't suffer from the described problem.
  2. and/or my test program is faulty (not too unlikely, I might add).
  3. and/or the sample is still too small to show the problem.

If 3. is true, then I don't think any "normal" use of Perl would need to worry. I can't remember ever drawing close to 2G random numbers in a single run ;-)

Since this was fun to do, I'd thought I'd share this.

Liz

Replies are listed 'Best First'.
Re: Random numbers Perl ok! (I think)
by Abigail-II (Bishop) on Oct 12, 2003 at 17:03 UTC
    You can't draw from your experiment the conclusion that Perl doesn't have a problem with random numbers. Perl does *not* generate random number itself, it will let the OS do that for it. Looking at Configure, Perl will try to use drand48(), random() or rand(), in that order (unless configured manually otherwise).

    So, the best you can conclude there's no problem with whatever function Perl is using on your system. It might be different on other platforms

    Abigail

Re: Random numbers Perl ok! (I think)
by demerphq (Chancellor) on Oct 12, 2003 at 11:08 UTC

    Hmm. I looked into this (god knows why, im not a math type at all, nor do I use random numbers for anything serious). My code is as follows:

    my $count=1; my %hash; my $last=-1; for (1..1000000) { my $num=int rand 10; if ($num == $last) { $count++; } else { $hash{$last}{$count}++ unless $last<0; $count=1; } $last=$num; } use Data::BFDump; Dump(\%hash)->Out();

    Ive run it a few times for various large number of iterations and it does seem like sequences of 0 are just slightly more common than other numbers. I dont have the stats skills currently and I can't be bothered doing the research to find out how to find out how much the deviation is, or if its signifigant. Update: Nah. I did a few more runs, I now dont think there appears to be much difference.

    Im hoping tilly or one of the other math types comes in and explains all this for us. :-)


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


Re: Random numbers Perl ok! (I think)
by pg (Canon) on Oct 12, 2003 at 18:16 UTC

    Something not conceptually right. The entity suffers from this problem (if that article you mentioned is right) is: the algorithm that used to generate random numbers, not the language.

    The laguage is only loosely related to this problem, as whether you picked a good algorithm. Impletation is also not related here, as you may have a bad implementation that suffers from performance issues, but an implementation (if it truely follows the algorithm) does not usually alter the characteristic of the algorithm. Especially for something like generating random numbers, the implementation most likely already have been refined again and again...

    From Perl's point of view, it makes sense for perl to call c function to do this kind of stuff, but not doing it itself. That's not what Perl is targeted for.

Re: Random numbers Perl ok! (I think)
by idsfa (Vicar) on Oct 12, 2003 at 20:59 UTC

    It looks like Abigail sufficiently covered the fallacy of this being a test of perl, but to build on pg's comment, consider Math::TrulyRandom, an oldie but a goodie for getting decent randomness.


    Remember, when you stare long into the abyss, you could have been home eating ice cream.
Re: Random numbers Perl ok! (I think)
by a-mused (Initiate) on Oct 23, 2003 at 13:31 UTC
    In addition to Abigail's (as usual ;) correct assertion that Perl uses the OS' underpinnings to do (pseudo)random number generation, do note that as good as the PRNG is with drand48 (on most modern linux distributions), 48 bits is not sufficent for crypographic needs. For most computing purposes, it's sufficient - just not crypto. For that, there are several packages which work out a bit better - as have been mentioned earlier.

    The problem doesn't stem from needing to draw 2G random numbers in a single run but how random the sequence is for those 2G numbers over multiple runs of smaller draws over time. Besides, to a human, 2G can seem like a big number but to a computer it's *teeny*.

    As an introduction into the visualization of PRNG data, I'd recommend having a look at this where they've analyzed the pseudorandomness in the TCP sequences of various stack implimentations.

    Kind Regards,
    a-mused