http://www.perlmonks.org?node_id=1032701


in reply to generate random number without using any built in function in perl?

There are a number of well-documented Pseudo Random Number Generation algorithms available for the searching on the 'net. Each has strengths and weaknesses, and all will produce pseudo-random numbers (even Perl's rand produces pseudo-random numbers).

There are varying qualities of pseudo-random number generators; some are adequate for general purpose use (Perl's rand, for example). Some are suitable for Monte Carlo simulations (Mersenne Twister, for example), and some are suitable for cryptographic purposes (A block cypher in counter mode, or ISAAC, for example).

All of these need to be seeded somehow. Perl's built-in PRNG seeds using a combination of time, process ID, and some bit twiddling, if I'm not mistaken. Porting one of the well-known algorithms to Perl is only half the problem (and in the case of many of the common algorithms, one that's already been solved on CPAN). The other half of the problem is seeding. Perl's method of seeding is again adequate for "general purpose" use. But for more stringent uses, there are more robust techniques; reading from /dev/urandom (non-blocking), or /dev/random (blocking), or through Windows API calls (Windows XP or newer). Seeding in a way that is portable across a bunch of systems is best handled through a module like Dana Jacobsen's Crypt::Random::Seed, which abstracts away the platform specifics, allowing it to be portably useful. On systems where no "standard" entropy pool is available, degrades gracefully to Crypt::Random::TESHA2 (short for Timer Entropy => SHA2).

So to summarize, if I were to write a random number generator that avoided rand, I would first ask what it will be used for. Then I will evaluate the many well-known PRNG algorithms to determine which one meets those needs. After porting its code to Perl (or better, after finding it on CPAN), I would then go looking at how strong of a seed I need, and would probably end up using Dana's module regardless, since it's so convenient and provides options for non-blocking sources.

Finally, I would use tests such as FIPS140-2 to evaluate the quality of the randomness. Of course there are as many types of randomness tests as there are types of PRNGs, so you have to pick ones that test for your use case (ie, statistical use might test differently than crypto use).

The last thing I would do, however, is try to invent my own PRNG algorithm. There's at least 40 years of prior art available to learn from.


Dave

Replies are listed 'Best First'.
Re^2: generate random number without using any built in function in perl?
by BrowserUk (Patriarch) on May 08, 2013 at 23:50 UTC

    Why use pseudo-random when you can have random? :)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      It's an interesting concept. If I understand (and please correct me if I'm wrong), you're using the unpredictability of the relationship between a few asynchronous threads as an entropy source, and then reducing that entropy to, in this case, a range of 0 .. 23, though it could just as easily be full bytes, or something else.

      When I run it with threaded Perl 5.14.2 on Ubuntu 13.04 (the only threaded Perl I have built at the moment), my distributions are not nearly so nice as yours. I keep seeing things like:

      { "0" => 3810, "2" => 5960, "16" => 230 }

      I have no idea why there's a difference. I fiddled with different numbers of threads, but that didn't produce appreciable improvements toward getting a smoother distribution.

      I also wonder if you need to consider modulo bias. Well, actually looking at the code again, I don't wonder, I'm pretty sure modulo bias can be an issue if $M is set to a number that doesn't divide evenly into 2^32.

      Initially I intended to test your code with A Perl adaptation of the FIPS-140-2 test, and A randomness dispersion test, but it doesn't take that sharp of an instrument to spot clusters when there are only three numbers drawn out of a possible 24 given thousands of picks.

      Any thoughts why this works for you and not for me?


      Dave

        First, the smiley in my post was intended to indicate a non-serious post.

        1. The OPs request makes no sense at all -- other than an interesting academic diversion; or a way to explore Perl -- when there are modules like Math::Random::MT available.

          Using the non-determinacy of threading as a source of entropy is an interesting academic exercise.

        2. Running 4 threads flat out to generate random numbers is ... shall we say 'overkill' :)
        When I run it with threaded Perl 5.14.2 on Ubuntu 13.04 (the only threaded Perl I have built at the moment), my distributions are not nearly so nice as yours. I keep seeing things like: { "0" => 3810, "2" => 5960, "16" => 230 }. I have no idea why there's a difference.

        I suspect the *nix threading primitives are the cause. Inserting a 10 second delay after the threads are spawned and before $rand is used might allow the threads to get up a running. But that is a guess. (This is not the only area of difference I've seen between threads on windows and *nix; the root of which lies beyond the scope of Perl.)

        I also wonder if you need to consider modulo bias. Well, actually looking at the code again, I don't wonder, I'm pretty sure modulo bias can be an issue if $M is set to a number that doesn't divide evenly into 2^32.

        Nope.

        Every PRNG in existance uses modulo operations; the statistics of doing so are thoroughly examined and well documented. Modulo operation bias on fixed-width (32/64-bit) random integer generators output is confined to the highest integer, and very minor. (ie. if mod 24, then 23 has a slightly less chance of being picked; but well within the statistical variation expected.)

        I've also tested this fairly extensively (on my system):

        And with FIPS-140-2:

        ... my $bytes = join '', map chr( unpack( 'V', $rand ) % 256 ), 1 .. $N; my $set = unpack '%32b*', $bytes; printf "In $N bytes, there are %d ones and %d zeros\n", $set, length( +$bytes ) *8 - $set; __END__ C:\test>rand -M=97 -N=1e5 In 1e5 bytes, there are 401004 ones and 398996 zeros C:\test>rand -M=97 -N=1e5 In 1e5 bytes, there are 400686 ones and 399314 zeros C:\test>rand -N=1e6 In 1e6 bytes, there are 4002738 ones and 3997262 zeros C:\test>rand -N=1e6 In 1e6 bytes, there are 3997975 ones and 4002025 zeros

        In this test, I extend that notion to runs of 1s of various lengths (the 5-sigma figure is equivalent to 21 heads in a row):

        C:\test>rand -N=1e6 ... my $bytes = join '', map chr( unpack( 'V', $rand ) % 256 ), 1 .. $N; my $set = unpack '%32b*', $bytes; printf "In $N bytes, there are %d ones and %d zeros\n", $set, length( +$bytes ) *8 - $set; use Statistics::Basic qw[ stddev ]; my @bytes; ++$bytes[ ord( substr $bytes, $_, 1 ) ] for 0 .. length( $bytes ) -1; #pp \@bytes; print "Stddev: ", stddev( @bytes ); my %runs; my $bits = unpack 'b*', $bytes; for my $n ( 2 .. 40 ) { ++$runs{ length() } for $bits =~ m[(?<=0)(1{$n})(?=0)]g; } pp \%runs; __END__ C:\test>rand -N=1e6 In 1e6 bytes, there are 4000292 ones and 3999708 zeros Stddev: 191.23 { 2 => 500474, 3 => 249154, 4 => 124655, 5 => 61747, 6 => 31528, 7 => +25885, 8 => 4672, 9 => 656, 10 => 175, 11 => 42, 12 => 17, 13 => 16, +14 => 18, 15 => 13, 16 => 91, 17 => 38, 18 => 21, 19 => 13, 20 => 10, + 21 => 6, 22 => 5, 23 => 9, 24 => 50, 25 => 23, 26 => 17, 27 => 5, 28 + => 6, 29 => 2, 30 => 2, 31 => 3, 32 => 37, 33 => 21, 34 => 9, 35 => +2, 36 => 2, 38 => 4, 39 => 5, 40 => 29 } C:\test>rand -N=1e6 In 1e6 bytes, there are 3991425 ones and 4008575 zeros Stddev: 197.76 { 2 => 500058, 3 => 248757, 4 => 124893, 5 => 62380, 6 => 30268, 7 => +24800, 8 => 4693, 9 => 582, 10 => 153, 11 => 59, 12 => 26, 13 => 13, +14 => 10, 15 => 11, 16 => 108, 17 => 45, 18 => 14, 19 => 11, 20 => 5, + 21 => 8, 22 => 7, 23 => 2, 24 => 46, 25 => 13, 26 => 10, 27 => 6, 28 + => 5, 29 => 3, 30 => 3, 31 => 1, 32 => 50, 33 => 10, 34 => 6, 35 => +3, 36 => 5, 37 => 3, 38 => 1, 39 => 1, 40 => 40 }

        If you want to plot those out, you'll find that all those numbers correspond uncannily with statistical expectations.

        Indeed, if you can find a test -- preferably one that doesn't require a Masters in *nixisms for me to get to run on my system -- I'll run it.

        It is, to the best of my ability to test it. truly random. Just horribly expensive for a PRNG! :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        a shot in the dark guess, it might be the size of IV, I mostly get one key, and once or twice every 20 runs two :)

        $ perl -V:ivsize ivsize='4'; $ perl fudge { 22 => 10000 } $ perl fudge { 4 => 8268, 11 => 1732 }