http://www.perlmonks.org?node_id=298618

The NRC Handelsblad this Saturday had an article about pseudo-random number sequences. They mentioned the work of Heiko Bauke and Stephan Mertens: Entropy of Pseudo Random Numbers. According to the article in the NRC, it seems that many random number generators tend to generate longer sequences of zero's than you might want or expect. The article also claims that Bauke and Mertens indicate that by removing the 0's from random number sequences, you improve the quality of the random number sequence. I've tried to make heads or tails from the original article by Bauke and Mertens, but it is waaay out of my league. So I hope the assumptions of the NRC article are correct.

Anyway, I decided to see whether the standard Perl random number generator would suffer from this problem, and created the following program:

use strict; my @freq; my @same; use constant MAX => 10; my $running = 1; $SIG{INT} = sub { $running = 0 }; # keep running until Control-C'd while ($running) { my $value = int rand MAX; $freq[$value]++; substr( $_,0,0 ) = $value; s#^((.)\2*)(.*)#$1#s; next unless length( $3 ) > 1; $same[substr $3,0,1]->{length $3}++; } my $total; foreach (0..MAX-1) { $total += $freq[$_]; my $ref = $same[$_]; print qq{\n$_($freq[$_]): @{[map {"$_($ref->{$_}x)"} sort {$a <=> +$b} keys %{$ref}]}}; } print "\n$total total\n";

The program basically draws random numbers and puts them into a string. A regular expression is used to find out whether the number was the same as the previous time. If not, it will score the number of times the previously drawn random number was drawn consecutively.

The program will run until interrupted with a Control-C. Then it will display the result.

I let this run on my iBook last night (I guess in total for about 9 hours or so) and came up with the following results.

N frequency consecutives(times x) 0(178763622): 2(14480890x) 3(1449710x) 4(144353x) 5(14618x) 6(1459x) 7 +(146x) 8(19x) 9(1x) 1(178752652): 2(14483383x) 3(1447847x) 4(144631x) 5(14362x) 6(1418x) 7 +(154x) 8(14x) 9(1x) 2(178739112): 2(14480973x) 3(1447340x) 4(144844x) 5(14428x) 6(1492x) 7 +(145x) 8(18x) 9(1x) 3(178744059): 2(14484560x) 3(1448751x) 4(145191x) 5(14589x) 6(1476x) 7 +(140x) 8(13x) 9(4x) 4(178763609): 2(14475608x) 3(1447185x) 4(144794x) 5(14502x) 6(1462x) 7 +(149x) 8(14x) 5(178757189): 2(14476120x) 3(1448546x) 4(144803x) 5(14335x) 6(1373x) 7 +(135x) 8(21x) 9(1x) 6(178784085): 2(14488900x) 3(1446212x) 4(144556x) 5(14308x) 6(1454x) 7 +(165x) 8(15x) 9(3x) 7(178749595): 2(14475124x) 3(1448746x) 4(143875x) 5(14586x) 6(1464x) 7 +(158x) 8(7x) 9(3x) 8(178766722): 2(14483441x) 3(1448498x) 4(144450x) 5(14594x) 6(1448x) 7 +(150x) 8(18x) 9(1x) 9(178775354): 2(14481938x) 3(1446327x) 4(145687x) 5(14591x) 6(1480x) 7 +(157x) 8(10x) 9(2x) 1787595999 total
Looking at these numbers, I would think that either:
  1. Perl doesn't suffer from the described problem.
  2. and/or my test program is faulty (not too unlikely, I might add).
  3. and/or the sample is still too small to show the problem.

If 3. is true, then I don't think any "normal" use of Perl would need to worry. I can't remember ever drawing close to 2G random numbers in a single run ;-)

Since this was fun to do, I'd thought I'd share this.

Liz