Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^9: PDL and srand puzzle - uniqueness PDL->random vs CORE::rand

by marioroy (Prior)
on Jun 06, 2024 at 15:51 UTC ( [id://11159819]=note: print w/replies, xml ) Need Help??


in reply to Re^8: PDL and srand puzzle
in thread PDL and srand puzzle

I understood your recommendation to not call srandom inside threads. Basically, I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand(), regardless if calling srandom before spawning threads.

Edit: etj identified a race condition, hence less uniqueness.

use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; srandom(3); # PDL 2.089_01 for my $tid (1..16) { threads->create(sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } lock $lock; print $output; }); } $_->join for threads->list;

CORE::rand()

$ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000

PDL->random

$ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7507936 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785

Replies are listed 'Best First'.
Re^10: PDL and srand puzzle
by syphilis (Archbishop) on Jun 07, 2024 at 09:46 UTC
    I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand()

    This indicates that random() generates fewer random bits than rand(). Right ??

    With perl, $Config{randbits} should report the number of random bits being generated by rand() - which is 48 on my SP-5.38.2.
    Is the comparable value for PDL's random() function readily available ?

    Cheers,
    Rob
      Can this even return anything else since 5.20? Consistent RNG.
      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Can this even return anything else since 5.20? Consistent RNG

        Looks like not - and thanks for pointing that out.
        But the main thing is that $Config{randbits} accurately tells us just how many random bits there are in the values returned by rand().
        And, on Strawberry Perl 5.38.2, that seems to be the case:
        D:\>perl -MMath::MPFR=":mpfr" -le "for(1..10) { Rmpfr_dump(Math::MPFR- +>new(rand())) }" 0.10111101010110010000010001100010101000100000001000000E-1 0.10101011001001100100100010110010011000110111100000000E0 0.11101100101010101111101111110010001010100010001100000E0 0.11110011010000100001010001110110110001011110010000000E-1 0.10101100000100111001001010001101000110100010101000000E-1 0.11101001110010110101101100100110111011111111110000000E0 0.10111011011010001000111110000000100101100101011100000E0 0.11110010000111110101011000001100010110100010110000000E-1 0.10000101100011000000001001101010111101100110100100000E0 0.11100110010000001010111101110010100000011000000000000E-1
        Note that the last 5 bits are always 0.

        The remaining question is: "How many random bits is random() providing ?"

        Cheers,
        Rob
      No. It may indicate PDL's random() is being seeded incorrectly.

      PDL's RNG is at https://github.com/PDLPorters/pdl/blob/master/Basic/Primitive/xoshiro256plus.c, and generates 64(!) bits of randomness, of which only 53 are used in an IEEE 754 double-precision number. See the comments in that file for limitations, including that the lowest 3 bits can fail linearity testing, which is not likely to be a problem. Note it generates n (a parameter) seeds, not necessarily just one, so independent POSIX threads will have different seeds if PDL's built-in POSIX threading is used to generate large amounts of random numbers.

      PDL's srandom function, if not given an input, uses PDL::Core::seed - see https://github.com/PDLPorters/pdl/pull/352.

      There is no interaction with Perl's srand() or rand() functionality.

      EDIT - I have just realised that Perl's "threads" functionality might clash with this, if it uses POSIX threads: PDL's RNG has a global vector of n seeds, in a global C variable. PDL's code will, if it is being used in what it thinks is single-threaded mode, use the 0th offset in that. If multiple POSIX threads are accessing that single seed (which gets updated on each RNG generation), there will be a race condition, hence less uniqueness. The workaround for that would be to generate the random numbers in the main thread first, possibly using PDL's multithreading if there are >1e6 elements in the ndarray, then use those suitably divided in the Perl threads.

        PDL's RNG is at https://github.com/PDLPorters/pdl/blob/master/Basic/Primitive/xoshiro256plus.c, and generates 64(!) bits of randomness, of which only 53 are used in an IEEE 754 double-precision number.

        How does one retrieve (from the PDL object that random() returns) that actual double-precision random value as an NV ?
        BTW, I initially read "64(!) bits of randomness" as "64! bits of randomness" ... and was very impressed ;-)

        Cheers,
        Rob
      This indicates that random() generates fewer random bits than rand(). Right ??

      Possibly not the case, validated here running non-threads, nearly 1 billion unique.

      use v5.030; use Config; use PDL; use Math::MPFR qw(:mpfr); use Math::Random; use Math::Random::MT::Auto; say "# randbits ", $Config{randbits}; say; { say "# CORE::rand()"; CORE::srand(42); for (1..10) { my $r = CORE::rand(); Rmpfr_dump( Math::MPFR->new($r) ); } say; } { say "# PDL->random()"; PDL::srandom(42); # PDL::srand(42) using v2.062 ~ v2.089 for (1..10) { my $r = PDL->random(); Rmpfr_dump( Math::MPFR->new($r->at(0)) ); } say; } { say "# Math::Random::random_uniform()"; Math::Random::random_set_seed(42, 42); for (1..10) { my $r = Math::Random::random_uniform(); Rmpfr_dump( Math::MPFR->new($r) ); } say; } { say "# Math::Random::MT::Auto::rand()"; Math::Random::MT::Auto::set_seed(42); for (1..10) { my $r = Math::Random::MT::Auto::rand(); Rmpfr_dump( Math::MPFR->new($r) ); } say; }

      Output

      # randbits 48 # CORE::rand() 0.10111110100110010011000010111110010100010000000100000E0 0.10101111011101101001000101110110110001101111000000000E-1 0.11100011100000001010111000111001010100010001100000000E-3 0.11011000001111001100111111011000110001011110010000000E-1 0.10100110000111011001110100011100011010001010100000000E-3 0.11011011001111111011001010111111111011111111110000000E0 0.11111111011000101010001101001011001011001010111000000E-1 0.11110101001001110010010110001110010110100010110000000E-1 0.10110000110110010001010110010111111101100110100100000E0 0.11010101101001111110111111100010010000001100000000000E0 # PDL->random() 0.10101111101000001010000100101001111100011011001010100E-3 0.10011110111011100011111000010001111010000100001000111E-1 0.10000000001001001001010111101100010000010100010000110E-3 0.10011100111010000111111110011100100100101010011011100E-1 0.10000000100101010010000101001110111101111011110010101E-163 0.10011011001010110110000001100001001101101100010010111E-1 0.10110011011011001010101001110100111010100100100011001E-3 0.11101000011110111000100001001011101011010100100111010E-2 0.11011100111000000010100111101010011100110011111110001E0 0.10000101111100111111100010010000100100101010000110101E0 # Math::Random::random_uniform() 0.11111111111111110010000110000101111111110110110001010E0 0.11101110000010001101001101011000000100011101110001101E0 0.11000110111111000010100111101100001000110111001111100E-2 0.11111101010101011110101010101000001110110000110111100E0 0.10001101001110100011110110011001110010001010110011110E0 0.11001010101110011011001100000000100111110101000011101E0 0.11101010010000110110000001111111100100001011111000100E0 0.10001110011001110000110000110100100100000110111000100E0 0.10010010110111111100000011011111100010001001011000100E0 0.11011100010001101010110011010010010001101110111011010E-5 # Math::Random::MT::Auto::rand() 0.10000011111100001110111010101101001001100110101101010E0 0.10010000110000000011001110110001110111110111001111010E0 0.11101001010100101010111100101111111000000000110000100E0 0.11100101011001100001101111011111100011000101011101100E-1 0.11111011000000000011000100101100011111100101000000100E-1 0.10011111011111000111001111011010011100010111101000000E-2 0.10100100100000001110100110100110001110110010001100110E0 0.11101001110001010011111001011100111101111101000110010E0 0.11100100100101101001011110101000010101001011010111000E-1 0.10101100001101001001101101000011010011110011101000000E-2
        Rmpfr_dump( Math::MPFR->new($r.at(0)) );

        In executing that, you are assigning 15-significant-decimal-digit strings (eg 0.816545445367819) to a 53-bit precision float (double) ... and then observing that the assignment populates all 53 bits:
        D:\>perl -MMath::MPFR=":mpfr" -MPDL -le "for(1..10) {$r = PDL->random( +); Rmpfr_dump(Math::MPFR->new($r.at(0)))}" 0.11101001100100000110110101110010100111100110010011000E-1 0.10110010100010010011101011001111010011110001011001001E-2 0.11101000000110110001010100000001011101100000111011111E-3 0.10110010010001101100111011100101000000001001100100011E0 0.11010000000101010100010000110000010010011110111101111E0 0.11101010010011100000100010101111011100001010000100111E-3 0.11001010100111101101000101101010111011100110010001111E-1 0.11111111111011111000010100111010010010001010101010110E-1 0.10111001110100001011011100010100001101011001101011010E-2 0.11011111011110110001110011111111011111010111110001100E-1
        But you'll see the same thing if you use 15-significant-decimal-digit representations of values provided by rand():
        D:\>perl -MMath::MPFR=":mpfr" -MPDL -le "for(1..10) {Rmpfr_dump(Math:: +MPFR->new(sprintf '%.15g', rand()))}" 0.11011001011111100110011011010010100010000000100011011E-3 0.10001011110110011101000101001110110001101111000000110E-1 0.11001011001111001101010100001100101010001000110000110E-2 0.11110111001110000011011110111000011000101111000111101E0 0.11110111100000100111010101111111100011010001010100010E0 0.10001110111001111100101011010111110111111111011111000E-1 0.10101100000000101101100111000011001011001010110111010E-1 0.11010110011101110001001101010011001011010001010111111E0 0.11111010001001001100100110011111101100110100011110010E-3 0.11010111110100100001010011100100000011000000000000100E-4
        Yet, we already know (and you have just shown it) that rand() does not populate the 5 lowest bits.
        This behaviour is inevitable when double-precision values are rounded to 15 decimal digit precision, and then assigned back in.
        If 17 decimal digits of precision were provided, this capacity to mislead would go away:
        D:\>perl -MMath::MPFR=":mpfr" -le "for(1..10) { Rmpfr_dump(Math::MPFR- +>new(sprintf '%.17g', rand()))}" 0.10110100110110100010000010011000101000100000001000000E-1 0.10001101011001100001100001100010110001101111000000000E-1 0.11111010010100100101001100000101001010100010001100000E0 0.11000011010000000101111010100100110001011110010000000E-1 0.11111000011001100111110000100011000110100010101000000E-1 0.11101100001000111011110010101111011111111110000000000E-3 0.10001001000110010110010011000011100101100101011100000E0 0.10100000101110001010101010001101001011010001011000000E0 0.11100000100010000001100011100101111101100110100100000E0 0.10000000011000110110000011000010000001100000000000000E-3
        Unfortunately, there's not much to be gleaned from looking at double-precision values rounded to 15-significant-decimal-digit strings.
        We need to be looking at how those values were derived.

        Update: Mind you, I could be barking up the wrong tree, anyway - which is even more likely if this issue you've identified is limited to threading.

        Cheers,
        Rob

        The following is a billion demonstration. Workers output directly to STDOUT and orderly via MCE::Relay.

        Update: Reduce RAM usage by processing a sequence of numbers instead (~ 45MB per worker). For predictable output, this requires seeding the generator per each sequence. Previously, the code spawned 100 workers and 1e7 loop iterations (~ 24GB RAM).

        use v5.030; use PDL; use MCE 1.895; use Math::Prime::Util; use Math::Random; use Math::Random::MT::Auto; CORE::srand(42); # This also, for MCE predictable results # MCE sets internal seed = CORE::random() PDL::srandom(42); # PDL::srand(N) v2.062 ~ v2.089 Math::Prime::Util::srand(42); Math::Random::random_set_seed(42, 42); Math::Random::MT::Auto::set_seed(42); MCE->new( max_workers => MCE::Util::get_ncpu(), chunk_size => 1, init_relay => 0, posix_exit => 1, sequence => [ 1, 1000 ], user_func => sub { # my ($mce, $seq, $chunk_id) = @_; my $seq = $_; my $output = ""; # Worker seeds generator using MCE's seed and wid value. # For sequence of numbers, compute similarly using $seq value. my $seed = abs(MCE->seed - ($seq * 100000)) % 1073741780; CORE::srand($seed); # PDL::srand($seed); # PDL v1.062 ~ 1.089 PDL::srandom($seed); # PDL v1.089_01 Math::Prime::Util::srand($seed); Math::Random::random_set_seed($seed, $seed); Math::Random::MT::Auto::set_seed($seed); my $prngMT = Math::Random::MT::Auto->new(); $prngMT->srand($seed); for ( 1 .. 1e6 ) { # my $r = CORE::rand(); # my $r = PDL->random(); # my $r = Math::Prime::Util::drand(); # my $r = Math::Prime::Util::irand64(); # my $r = Math::Random::random_normal(); # my $r = Math::Random::random_uniform(); # my $r = Math::Random::MT::Auto::rand(); my $r = $prngMT->rand(); $output .= "$r\n"; } MCE::relay { print $output; }; } )->run;
        Time to run:
        $ time perl test.pl > out # 17GB ~ 19GB file size CORE::rand . . . . . . . . . . . 12.275s PDL->random . . . . . . . . . . . 2m04.810s Math::Prime::Util::drand . . . . 13.358s Math::Prime::Util::irand64 . . . 12.462s Math::Random::random_normal . . . 16.341s Math::Random::random_uniform . . 16.328s Math::Random::MT::Auto::rand . . 15.040s $prngMT->rand . . . . . . . . . . 13.339s $ wc -l out 1000000000

        GNU sort lacks parallel capability processing STDIN. See mcesort (GitHub Gist). Another option is GNU parallel parsort.

        # Sorting consumes 17GB ~ 19GB temporary space. # Specify a path with sufficient storage. $ perl test.pl | LC_ALL=C sort -T/dev/shm -u | wc -l $ perl test.pl | LC_ALL=C mcesort -T/dev/shm -j75% -u | wc -l $ perl test.pl | LC_ALL=C parsort -T/dev/shm --parallel=12 -u | wc -l CORE::rand . . . . . . . . . . . 1000000000 PDL->random . . . . . . . . . . . 999999510 Math::Prime::Util::drand . . . . 999999553 Math::Prime::Util::irand64 . . . 1000000000 Math::Random::random_normal . . . 816337225 Math::Random::random_uniform . . 799455502 Math::Random::MT::Auto::rand . . 999999527 $prngMT->rand . . . . . . . . . . 999999527

        The mcesort variant runs on Linux including macOS, FreeBSD, and derivatives.

      This indicates that random() generates fewer random bits than rand(). Right ??

      I don't think so. Looking at the produced values was a surprise.

      Took the threads loop and ran it with 16 pids and 1024 random() calls. Used "%.64f" "%.72f" as format for the generated numbers. In most cases, different runs produce identical results.

      <Update>
      For completeness, here is the script.

      #!/usr/bin/perl use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; PDL::srand(3); # PDL 2.087 for my $tid (1..16) { threads->create(sub { my $output = ""; for (1..1024) { my $r = random(); $output .= sprintf "%.72f\n", $r; } lock $lock; print $output; }); } $_->join for threads->list;
      </Update>

      When a run produces a different result, I see two scenarios:

      • One or two numbers are missing and are replaced by an already produced value. See examples 2 and 3 below.
      • The generated numbers are significantly different. See example 4.

      $ ./pdl-rand.pl | sort >sort.16384.1 $ ./pdl-rand.pl | sort >sort.16384.2 $ ./pdl-rand.pl | sort >sort.16384.3 $ ./pdl-rand.pl | sort >sort.16384.4 $ sort -u sort.16384.1 |wc -l 16384 $ sort -u sort.16384.2 |wc -l 16383 $ sort -u sort.16384.3 |wc -l 16382 $ sort -u sort.16384.4 |wc -l 16383 $ diff -U 1 sort.16384.1 sort.16384.2 --- sort.16384.1 2024-06-08 09:45:09.466739327 +0200 +++ sort.16384.2 2024-06-08 09:50:24.203356598 +0200 @@ -12562,2 +12562,3 @@ 0.7698795891537290048134423159353900700807571411132812500000000000000 +00000 +0.7698795891537290048134423159353900700807571411132812500000000000000 +00000 0.7699350058887169945265327442029956728219985961914062500000000000000 +00000 @@ -14417,3 +14418,2 @@ 0.8818249357757870221519169717794284224510192871093750000000000000000 +00000 -0.8818467386417550013533173114410601556301116943359375000000000000000 +00000 0.8818802323789649566521120505058206617832183837890625000000000000000 +00000 $ diff -U 1 sort.16384.1 sort.16384.3 --- sort.16384.1 2024-06-08 09:45:09.466739327 +0200 +++ sort.16384.3 2024-06-08 09:52:32.837243809 +0200 @@ -13113,2 +13113,3 @@ 0.8005079965106629558135864499490708112716674804687500000000000000000 +00000 +0.8005079965106629558135864499490708112716674804687500000000000000000 +00000 0.8007896602698270083209308722871355712413787841796875000000000000000 +00000 @@ -13686,3 +13687,2 @@ 0.8361851909041919661547126452205702662467956542968750000000000000000 +00000 -0.8361968170640460273901339860458392649888992309570312500000000000000 +00000 0.8364519442006479454931877626222558319568634033203125000000000000000 +00000 @@ -14417,3 +14417,2 @@ 0.8818249357757870221519169717794284224510192871093750000000000000000 +00000 -0.8818467386417550013533173114410601556301116943359375000000000000000 +00000 0.8818802323789649566521120505058206617832183837890625000000000000000 +00000 @@ -16095,2 +16094,3 @@ 0.9834468387403769717991508514387533068656921386718750000000000000000 +00000 +0.9834468387403769717991508514387533068656921386718750000000000000000 +00000 0.9837232694387190168328061190550215542316436767578125000000000000000 +00000 $ diff -U 1 sort.16384.1 sort.16384.4|head -12 --- sort.16384.1 2024-06-08 09:45:09.466739327 +0200 +++ sort.16384.4 2024-06-08 09:58:27.630459409 +0200 @@ -1,154 +1,161 @@ -0.0000798345195128779969639606917120033813262125477194786071777343750 +00000 +0.0000273717076871849010000561919220274376129964366555213928222656250 +00000 +0.0000550196742872266023331902229376311197484028525650501251220703125 +00000 0.0001113510803039119965475445273028753945254720747470855712890625000 +00000 -0.0001282989599132870110840404231922207145544234663248062133789062500 +00000 0.0002144808569095600127036443938166598854877520352602005004882812500 +00000 -0.0003265905868672370191731213484587215134524740278720855712890625000 +00000 0.0003745585362475710206227319520877472314168699085712432861328125000 +00000 -0.0004606498956905009760365299342765865731053054332733154296875000000 +00000 $ diff -U 0 sort.16384.1 sort.16384.4|grep -v '@@'|wc -l 19698

      I have no idea what's going on here but maybe someone else has.

      Update: Fixed manually modified file names in diff output.

      Greetings,
      🐻

      $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

        Predictability Summary:

        non-threads

        # Processes - CORE CORE::srand(N); CORE::rand(); - PDL CORE::srand(N); # This also, for MCE predictable results PDL::srand(N); # PDL v1.062 ~ v1.089 PDL::srandom(N); # PDL v1.089_01 PDL->random; # MCE v1.895 - Math::Prime::Util CORE::srand(N); # This also, for MCE predictable results Math::Prime::Util::srand(N); Math::Prime::Util::drand(); Math::Prime::Util::irand64(); - Math::Random CORE::srand(N); # This also, for MCE predictable results Math::Random::random_set_seed(N, N); Math::Random::random_normal(); Math::Random::random_uniform(); - Math::Random::MT::Auto CORE::srand(N); # This also, for MCE predictable results Math::Random::MT::Auto::set_seed(N); Math::Random::MT::Auto::rand();

        threads

        # Threads (matching non-threads output), requires MCE v1.894 - CORE CORE::srand(N); CORE::rand(); - Math::Prime::Util CORE::srand(N); # This also, for MCE predictable results Math::Prime::Util::srand(N); Math::Prime::Util::drand(); Math::Prime::Util::irand64(); - Math::Random::MT::Auto CORE::srand(N); # This also, for MCE predictable results Math::Random::MT::Auto::set_seed(N); Math::Random::MT::Auto::rand(); - Not possible PDL or Math::Random

        Running threads, I'm unable to generate predictable results using PDL or Math::Random (random_normal/random_uniform), regardless seeding the generator per each thread.

        See the complete demonstration here. Run Perl with -Mthreads to spin threads.

Re^10: PDL and srand puzzle
by etj (Priest) on Jun 07, 2024 at 09:24 UTC
    I'd have expected calling srandom inside new threads to behave better. Could you take a look at the srand code and see if anything is obviously wrong? Also, are you able to see if the duplicates are in groups i.e. sequences?

      See my reply to Rob. In short, calling PDL::srandom before spawning workers has no effect. The workaround is to call CORE::srand instead.

      See also, non-thread testing.

      Edit: MCE checks for PDL::Primitive->can('srand'), but missed checking PDL::Primitive->can('srandom'). Resolved in MCE v1.894 and MCE::Shared v1.889.

        Calling srand cannot be a workaround for this, because there is no interaction between Perl's RNG and PDL's. They are separate systems.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11159819]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2025-02-14 15:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found