Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^11: PDL and srand puzzle - not likely fewer random bits than rand

by marioroy (Prior)
on Jun 08, 2024 at 00:22 UTC ( [id://11159826]=note: print w/replies, xml ) Need Help??


in reply to Re^10: PDL and srand puzzle
in thread PDL and srand puzzle

This indicates that random() generates fewer random bits than rand(). Right ??

Possibly not the case, validated here running non-threads, nearly 1 billion unique.

use v5.030; use Config; use PDL; use Math::MPFR qw(:mpfr); use Math::Random; use Math::Random::MT::Auto; say "# randbits ", $Config{randbits}; say; { say "# CORE::rand()"; CORE::srand(42); for (1..10) { my $r = CORE::rand(); Rmpfr_dump( Math::MPFR->new($r) ); } say; } { say "# PDL->random()"; PDL::srandom(42); # PDL::srand(42) using v2.062 ~ v2.089 for (1..10) { my $r = PDL->random(); Rmpfr_dump( Math::MPFR->new($r->at(0)) ); } say; } { say "# Math::Random::random_uniform()"; Math::Random::random_set_seed(42, 42); for (1..10) { my $r = Math::Random::random_uniform(); Rmpfr_dump( Math::MPFR->new($r) ); } say; } { say "# Math::Random::MT::Auto::rand()"; Math::Random::MT::Auto::set_seed(42); for (1..10) { my $r = Math::Random::MT::Auto::rand(); Rmpfr_dump( Math::MPFR->new($r) ); } say; }

Output

# randbits 48 # CORE::rand() 0.10111110100110010011000010111110010100010000000100000E0 0.10101111011101101001000101110110110001101111000000000E-1 0.11100011100000001010111000111001010100010001100000000E-3 0.11011000001111001100111111011000110001011110010000000E-1 0.10100110000111011001110100011100011010001010100000000E-3 0.11011011001111111011001010111111111011111111110000000E0 0.11111111011000101010001101001011001011001010111000000E-1 0.11110101001001110010010110001110010110100010110000000E-1 0.10110000110110010001010110010111111101100110100100000E0 0.11010101101001111110111111100010010000001100000000000E0 # PDL->random() 0.10101111101000001010000100101001111100011011001010100E-3 0.10011110111011100011111000010001111010000100001000111E-1 0.10000000001001001001010111101100010000010100010000110E-3 0.10011100111010000111111110011100100100101010011011100E-1 0.10000000100101010010000101001110111101111011110010101E-163 0.10011011001010110110000001100001001101101100010010111E-1 0.10110011011011001010101001110100111010100100100011001E-3 0.11101000011110111000100001001011101011010100100111010E-2 0.11011100111000000010100111101010011100110011111110001E0 0.10000101111100111111100010010000100100101010000110101E0 # Math::Random::random_uniform() 0.11111111111111110010000110000101111111110110110001010E0 0.11101110000010001101001101011000000100011101110001101E0 0.11000110111111000010100111101100001000110111001111100E-2 0.11111101010101011110101010101000001110110000110111100E0 0.10001101001110100011110110011001110010001010110011110E0 0.11001010101110011011001100000000100111110101000011101E0 0.11101010010000110110000001111111100100001011111000100E0 0.10001110011001110000110000110100100100000110111000100E0 0.10010010110111111100000011011111100010001001011000100E0 0.11011100010001101010110011010010010001101110111011010E-5 # Math::Random::MT::Auto::rand() 0.10000011111100001110111010101101001001100110101101010E0 0.10010000110000000011001110110001110111110111001111010E0 0.11101001010100101010111100101111111000000000110000100E0 0.11100101011001100001101111011111100011000101011101100E-1 0.11111011000000000011000100101100011111100101000000100E-1 0.10011111011111000111001111011010011100010111101000000E-2 0.10100100100000001110100110100110001110110010001100110E0 0.11101001110001010011111001011100111101111101000110010E0 0.11100100100101101001011110101000010101001011010111000E-1 0.10101100001101001001101101000011010011110011101000000E-2

Replies are listed 'Best First'.
Re^12: PDL and srand puzzle
by syphilis (Archbishop) on Jun 08, 2024 at 05:14 UTC
    Rmpfr_dump( Math::MPFR->new($r.at(0)) );

    In executing that, you are assigning 15-significant-decimal-digit strings (eg 0.816545445367819) to a 53-bit precision float (double) ... and then observing that the assignment populates all 53 bits:
    D:\>perl -MMath::MPFR=":mpfr" -MPDL -le "for(1..10) {$r = PDL->random( +); Rmpfr_dump(Math::MPFR->new($r.at(0)))}" 0.11101001100100000110110101110010100111100110010011000E-1 0.10110010100010010011101011001111010011110001011001001E-2 0.11101000000110110001010100000001011101100000111011111E-3 0.10110010010001101100111011100101000000001001100100011E0 0.11010000000101010100010000110000010010011110111101111E0 0.11101010010011100000100010101111011100001010000100111E-3 0.11001010100111101101000101101010111011100110010001111E-1 0.11111111111011111000010100111010010010001010101010110E-1 0.10111001110100001011011100010100001101011001101011010E-2 0.11011111011110110001110011111111011111010111110001100E-1
    But you'll see the same thing if you use 15-significant-decimal-digit representations of values provided by rand():
    D:\>perl -MMath::MPFR=":mpfr" -MPDL -le "for(1..10) {Rmpfr_dump(Math:: +MPFR->new(sprintf '%.15g', rand()))}" 0.11011001011111100110011011010010100010000000100011011E-3 0.10001011110110011101000101001110110001101111000000110E-1 0.11001011001111001101010100001100101010001000110000110E-2 0.11110111001110000011011110111000011000101111000111101E0 0.11110111100000100111010101111111100011010001010100010E0 0.10001110111001111100101011010111110111111111011111000E-1 0.10101100000000101101100111000011001011001010110111010E-1 0.11010110011101110001001101010011001011010001010111111E0 0.11111010001001001100100110011111101100110100011110010E-3 0.11010111110100100001010011100100000011000000000000100E-4
    Yet, we already know (and you have just shown it) that rand() does not populate the 5 lowest bits.
    This behaviour is inevitable when double-precision values are rounded to 15 decimal digit precision, and then assigned back in.
    If 17 decimal digits of precision were provided, this capacity to mislead would go away:
    D:\>perl -MMath::MPFR=":mpfr" -le "for(1..10) { Rmpfr_dump(Math::MPFR- +>new(sprintf '%.17g', rand()))}" 0.10110100110110100010000010011000101000100000001000000E-1 0.10001101011001100001100001100010110001101111000000000E-1 0.11111010010100100101001100000101001010100010001100000E0 0.11000011010000000101111010100100110001011110010000000E-1 0.11111000011001100111110000100011000110100010101000000E-1 0.11101100001000111011110010101111011111111110000000000E-3 0.10001001000110010110010011000011100101100101011100000E0 0.10100000101110001010101010001101001011010001011000000E0 0.11100000100010000001100011100101111101100110100100000E0 0.10000000011000110110000011000010000001100000000000000E-3
    Unfortunately, there's not much to be gleaned from looking at double-precision values rounded to 15-significant-decimal-digit strings.
    We need to be looking at how those values were derived.

    Update: Mind you, I could be barking up the wrong tree, anyway - which is even more likely if this issue you've identified is limited to threading.

    Cheers,
    Rob
      From etj: Could you take a look at the srand code and see if anything is obviously wrong? Also, are you able to see if the duplicates are in groups i.e. sequences?
      From Rob: We need to be looking at how those values were derived.

      I'm hoping for the PDL development team to chime in. The testing was out of curiosity. Running non-threads here, PDL generates approximately one billion unique. For threads (uncomment use threads at the top of the script), greater than 92% unique. Better than Math::Random::random_normal/random_uniform.

      Calling PDL::srandom has no effect for predictability, before spawning workers. I commented out the line in my last post. It requires calling CORE::srand instead.

      Care is needed whether to call a PDL function or class method.

      PDL::srandom(42); # not PDL->srandom(42) $r = PDL->random(); # not PDL::random();

      I completed validation on the MCE side. Spawning child processes (non-threads), nearly one billion unique > 99.99995% for PDL->random, Math::Prime::Util::drand(), and Math::Random::MT::Auto::rand(). 80% ~ 82% unique for Math::Random::random_uniform() and Math::Random::random_normal().

      Running threads, Math::Random::random_normal and random_uniform take beyond 2 minutes to output one billion lines and score less than 20% unique.

      The relevant code in MCE 1.891 can be found here lines 644-662 and here lines 2036-2071. MCE::Child and MCE::Hobo have similar code.

      See also, Random Numbers Overview by danaj.

      Update:

      It turns out that I can seed the generator inside threads for CORE, Math::Prime::Util, and Math::Random::MT::Auto and have predictable results, matching non-threads. I released MCE v1.892/v1.893, removing the check whether spinning threads. v1.893 preserves calling CORE::srand(N) for older Perl, non-threads.

      # Sets the seed of the base generator uniquely between workers. # The new seed is computed using the current seed and ID value. # One may set the seed at the application level for predictable # results (non-thread workers only). Ditto for Math::Prime::Util, # Math::Random, Math::Random::MT::Auto, and PDL. # # MCE 1.892, 2024-06-08 # Removed check if spawning threads i.e. use_threads. # Predictable output matches non-threads for CORE, # Math::Prime::Util, and Math::Random::MT::Auto. # https://perlmonks.org/?node_id=11159834 { my $_wid = $_args[1]; my $_seed = abs($self->{_seed} - ($_wid * 100000)) % 2147483560; CORE::srand($_seed) if (!$self->{use_threads} || $] ge '5.020000'); + # drand48 Math::Prime::Util::srand($_seed) if $INC{'Math/Prime/Util.pm'}; if (!$self->{use_threads}) { PDL::srand($_seed) if $INC{'PDL.pm'} && PDL->can('srand'); # PDL + 2.062 ~ 2.089 PDL::srandom($_seed) if $INC{'PDL.pm'} && PDL->can('srandom'); # + PDL 2.089_01+ } } if (!$self->{use_threads} && $INC{'Math/Random.pm'}) { my ($_wid, $_cur_seed) = ($_args[1], Math::Random::random_get_seed( +)); my $_new_seed = ($_cur_seed < 1073741781) ? $_cur_seed + (($_wid * 100000) % 1073741780) : $_cur_seed - (($_wid * 100000) % 1073741780); Math::Random::random_set_seed($_new_seed, $_new_seed); } if ($INC{'Math/Random/MT/Auto.pm'}) { my ($_wid, $_cur_seed) = ( $_args[1], Math::Random::MT::Auto::get_seed()->[0] ); my $_new_seed = ($_cur_seed < 1073741781) ? $_cur_seed + (($_wid * 100000) % 1073741780) : $_cur_seed - (($_wid * 100000) % 1073741780); Math::Random::MT::Auto::set_seed($_new_seed); }
        I'm hoping for the PDL development team to chime in.
        I honestly wonder who you think that is.
Re^12: PDL and srand puzzle - one billion demonstration (updated)
by marioroy (Prior) on Jun 08, 2024 at 01:11 UTC

    The following is a billion demonstration. Workers output directly to STDOUT and orderly via MCE::Relay.

    Update: Reduce RAM usage by processing a sequence of numbers instead (~ 45MB per worker). For predictable output, this requires seeding the generator per each sequence. Previously, the code spawned 100 workers and 1e7 loop iterations (~ 24GB RAM).

    use v5.030; use PDL; use MCE 1.895; use Math::Prime::Util; use Math::Random; use Math::Random::MT::Auto; CORE::srand(42); # This also, for MCE predictable results # MCE sets internal seed = CORE::random() PDL::srandom(42); # PDL::srand(N) v2.062 ~ v2.089 Math::Prime::Util::srand(42); Math::Random::random_set_seed(42, 42); Math::Random::MT::Auto::set_seed(42); MCE->new( max_workers => MCE::Util::get_ncpu(), chunk_size => 1, init_relay => 0, posix_exit => 1, sequence => [ 1, 1000 ], user_func => sub { # my ($mce, $seq, $chunk_id) = @_; my $seq = $_; my $output = ""; # Worker seeds generator using MCE's seed and wid value. # For sequence of numbers, compute similarly using $seq value. my $seed = abs(MCE->seed - ($seq * 100000)) % 1073741780; CORE::srand($seed); # PDL::srand($seed); # PDL v1.062 ~ 1.089 PDL::srandom($seed); # PDL v1.089_01 Math::Prime::Util::srand($seed); Math::Random::random_set_seed($seed, $seed); Math::Random::MT::Auto::set_seed($seed); my $prngMT = Math::Random::MT::Auto->new(); $prngMT->srand($seed); for ( 1 .. 1e6 ) { # my $r = CORE::rand(); # my $r = PDL->random(); # my $r = Math::Prime::Util::drand(); # my $r = Math::Prime::Util::irand64(); # my $r = Math::Random::random_normal(); # my $r = Math::Random::random_uniform(); # my $r = Math::Random::MT::Auto::rand(); my $r = $prngMT->rand(); $output .= "$r\n"; } MCE::relay { print $output; }; } )->run;
    Time to run:
    $ time perl test.pl > out # 17GB ~ 19GB file size CORE::rand . . . . . . . . . . . 12.275s PDL->random . . . . . . . . . . . 2m04.810s Math::Prime::Util::drand . . . . 13.358s Math::Prime::Util::irand64 . . . 12.462s Math::Random::random_normal . . . 16.341s Math::Random::random_uniform . . 16.328s Math::Random::MT::Auto::rand . . 15.040s $prngMT->rand . . . . . . . . . . 13.339s $ wc -l out 1000000000

    GNU sort lacks parallel capability processing STDIN. See mcesort (GitHub Gist). Another option is GNU parallel parsort.

    # Sorting consumes 17GB ~ 19GB temporary space. # Specify a path with sufficient storage. $ perl test.pl | LC_ALL=C sort -T/dev/shm -u | wc -l $ perl test.pl | LC_ALL=C mcesort -T/dev/shm -j75% -u | wc -l $ perl test.pl | LC_ALL=C parsort -T/dev/shm --parallel=12 -u | wc -l CORE::rand . . . . . . . . . . . 1000000000 PDL->random . . . . . . . . . . . 999999510 Math::Prime::Util::drand . . . . 999999553 Math::Prime::Util::irand64 . . . 1000000000 Math::Random::random_normal . . . 816337225 Math::Random::random_uniform . . 799455502 Math::Random::MT::Auto::rand . . 999999527 $prngMT->rand . . . . . . . . . . 999999527

    The mcesort variant runs on Linux including macOS, FreeBSD, and derivatives.

      I tried another variation for PDL.

      Update: Reduce RAM usage by processing a sequence of numbers instead (~ 45MB per worker). For predictable output, this requires seeding the generator per each sequence. Previously, the code spawned 100 workers and 1e7 loop iterations (~ 24GB RAM).

      use v5.030; use PDL; use MCE 1.895; CORE::srand(42); # This also, for MCE predictable results # MCE sets internal seed = CORE::random() PDL::srandom(42); # PDL::srand(N) v2.062 ~ v2.089 MCE->new( max_workers => MCE::Util::get_ncpu(), chunk_size => 1, init_relay => 0, posix_exit => 1, sequence => [ 1, 1000 ], user_func => sub { # my ($mce, $seq, $chunk_id) = @_; my $seq = $_; my $output = ""; # Worker seeds generator using MCE's seed and wid value. # For sequence of numbers, compute similarly using $seq value. my $seed = abs(MCE->seed - ($seq * 100000)) % 1073741780; # PDL::srand($seed); # PDL v1.062 ~ 1.089 PDL::srandom($seed); # PDL v1.089_01 my $pdl = PDL->random(1e6); foreach (0 .. $pdl->nelem - 1) { my $r = $pdl->at($_); $output .= "$r\n"; } MCE::relay { print $output; }; } )->run;

      Time to run:

      $ time perl test_pdl.pl > out # 17GB file size PDL->random . . . . . . . . . . . 27.211s $ wc -l out 1000000000

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11159826]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2025-02-14 15:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found