in reply to Re^8: PDL and srand puzzle in thread PDL and srand puzzle
I understood your recommendation to not call srandom inside threads. Basically, I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand(), regardless if calling srandom before spawning threads.
Edit: etj identified a race condition, hence less uniqueness.
use v5.030;
use threads;
use threads::shared;
use PDL;
BEGIN { $PDL::no_clone_skip_warning = 1; }
my $lock : shared = 0;
srandom(3); # PDL 2.089_01
for my $tid (1..16) {
threads->create(sub {
my $output = "";
for (1..500000) {
# my $r = CORE::rand();
my $r = PDL->random;
$output .= "$r\n";
}
lock $lock;
print $output;
});
}
$_->join for threads->list;
CORE::rand()
$ perl test7.pl | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
8000000
PDL->random
$ perl test7.pl | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
7507936
$ perl test7.pl | LC_ALL=C sort -u | wc -l
7446785
$ perl test7.pl | LC_ALL=C sort -u | wc -l
7446785
Re^10: PDL and srand puzzle
by syphilis (Archbishop) on Jun 07, 2024 at 09:46 UTC
|
I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand()
This indicates that random() generates fewer random bits than rand(). Right ??
With perl, $Config{randbits} should report the number of random bits being generated by rand() - which is 48 on my SP-5.38.2.
Is the comparable value for PDL's random() function readily available ?
Cheers, Rob | [reply] |
|
Can this even return anything else since 5.20? Consistent RNG.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
Can this even return anything else since 5.20? Consistent RNG
Looks like not - and thanks for pointing that out.
But the main thing is that $Config{randbits} accurately tells us just how many random bits there are in the values returned by rand().
And, on Strawberry Perl 5.38.2, that seems to be the case:
D:\>perl -MMath::MPFR=":mpfr" -le "for(1..10) { Rmpfr_dump(Math::MPFR-
+>new(rand())) }"
0.10111101010110010000010001100010101000100000001000000E-1
0.10101011001001100100100010110010011000110111100000000E0
0.11101100101010101111101111110010001010100010001100000E0
0.11110011010000100001010001110110110001011110010000000E-1
0.10101100000100111001001010001101000110100010101000000E-1
0.11101001110010110101101100100110111011111111110000000E0
0.10111011011010001000111110000000100101100101011100000E0
0.11110010000111110101011000001100010110100010110000000E-1
0.10000101100011000000001001101010111101100110100100000E0
0.11100110010000001010111101110010100000011000000000000E-1
Note that the last 5 bits are always 0.
The remaining question is: "How many random bits is random() providing ?"
Cheers, Rob
| [reply] [d/l] |
|
No. It may indicate PDL's random() is being seeded incorrectly.
PDL's RNG is at https://github.com/PDLPorters/pdl/blob/master/Basic/Primitive/xoshiro256plus.c, and generates 64(!) bits of randomness, of which only 53 are used in an IEEE 754 double-precision number. See the comments in that file for limitations, including that the lowest 3 bits can fail linearity testing, which is not likely to be a problem. Note it generates n (a parameter) seeds, not necessarily just one, so independent POSIX threads will have different seeds if PDL's built-in POSIX threading is used to generate large amounts of random numbers.
PDL's srandom function, if not given an input, uses PDL::Core::seed - see https://github.com/PDLPorters/pdl/pull/352.
There is no interaction with Perl's srand() or rand() functionality.
EDIT - I have just realised that Perl's "threads" functionality might clash with this, if it uses POSIX threads: PDL's RNG has a global vector of n seeds, in a global C variable. PDL's code will, if it is being used in what it thinks is single-threaded mode, use the 0th offset in that. If multiple POSIX threads are accessing that single seed (which gets updated on each RNG generation), there will be a race condition, hence less uniqueness. The workaround for that would be to generate the random numbers in the main thread first, possibly using PDL's multithreading if there are >1e6 elements in the ndarray, then use those suitably divided in the Perl threads.
| [reply] [d/l] [select] |
|
PDL's RNG is at https://github.com/PDLPorters/pdl/blob/master/Basic/Primitive/xoshiro256plus.c, and generates 64(!) bits of randomness, of which only 53 are used in an IEEE 754 double-precision number.
How does one retrieve (from the PDL object that random() returns) that actual double-precision random value as an NV ?
BTW, I initially read "64(!) bits of randomness" as "64! bits of randomness" ... and was very impressed ;-)
Cheers, Rob
| [reply] [d/l] [select] |
|
|
|
|
use v5.030;
use Config;
use PDL;
use Math::MPFR qw(:mpfr);
use Math::Random;
use Math::Random::MT::Auto;
say "# randbits ", $Config{randbits};
say;
{
say "# CORE::rand()";
CORE::srand(42);
for (1..10) {
my $r = CORE::rand();
Rmpfr_dump( Math::MPFR->new($r) );
}
say;
}
{
say "# PDL->random()";
PDL::srandom(42); # PDL::srand(42) using v2.062 ~ v2.089
for (1..10) {
my $r = PDL->random();
Rmpfr_dump( Math::MPFR->new($r->at(0)) );
}
say;
}
{
say "# Math::Random::random_uniform()";
Math::Random::random_set_seed(42, 42);
for (1..10) {
my $r = Math::Random::random_uniform();
Rmpfr_dump( Math::MPFR->new($r) );
}
say;
}
{
say "# Math::Random::MT::Auto::rand()";
Math::Random::MT::Auto::set_seed(42);
for (1..10) {
my $r = Math::Random::MT::Auto::rand();
Rmpfr_dump( Math::MPFR->new($r) );
}
say;
}
Output
# randbits 48
# CORE::rand()
0.10111110100110010011000010111110010100010000000100000E0
0.10101111011101101001000101110110110001101111000000000E-1
0.11100011100000001010111000111001010100010001100000000E-3
0.11011000001111001100111111011000110001011110010000000E-1
0.10100110000111011001110100011100011010001010100000000E-3
0.11011011001111111011001010111111111011111111110000000E0
0.11111111011000101010001101001011001011001010111000000E-1
0.11110101001001110010010110001110010110100010110000000E-1
0.10110000110110010001010110010111111101100110100100000E0
0.11010101101001111110111111100010010000001100000000000E0
# PDL->random()
0.10101111101000001010000100101001111100011011001010100E-3
0.10011110111011100011111000010001111010000100001000111E-1
0.10000000001001001001010111101100010000010100010000110E-3
0.10011100111010000111111110011100100100101010011011100E-1
0.10000000100101010010000101001110111101111011110010101E-163
0.10011011001010110110000001100001001101101100010010111E-1
0.10110011011011001010101001110100111010100100100011001E-3
0.11101000011110111000100001001011101011010100100111010E-2
0.11011100111000000010100111101010011100110011111110001E0
0.10000101111100111111100010010000100100101010000110101E0
# Math::Random::random_uniform()
0.11111111111111110010000110000101111111110110110001010E0
0.11101110000010001101001101011000000100011101110001101E0
0.11000110111111000010100111101100001000110111001111100E-2
0.11111101010101011110101010101000001110110000110111100E0
0.10001101001110100011110110011001110010001010110011110E0
0.11001010101110011011001100000000100111110101000011101E0
0.11101010010000110110000001111111100100001011111000100E0
0.10001110011001110000110000110100100100000110111000100E0
0.10010010110111111100000011011111100010001001011000100E0
0.11011100010001101010110011010010010001101110111011010E-5
# Math::Random::MT::Auto::rand()
0.10000011111100001110111010101101001001100110101101010E0
0.10010000110000000011001110110001110111110111001111010E0
0.11101001010100101010111100101111111000000000110000100E0
0.11100101011001100001101111011111100011000101011101100E-1
0.11111011000000000011000100101100011111100101000000100E-1
0.10011111011111000111001111011010011100010111101000000E-2
0.10100100100000001110100110100110001110110010001100110E0
0.11101001110001010011111001011100111101111101000110010E0
0.11100100100101101001011110101000010101001011010111000E-1
0.10101100001101001001101101000011010011110011101000000E-2
| [reply] [d/l] [select] |
|
Rmpfr_dump( Math::MPFR->new($r.at(0)) );
In executing that, you are assigning 15-significant-decimal-digit strings (eg 0.816545445367819) to a 53-bit precision float (double) ... and then observing that the assignment populates all 53 bits:
D:\>perl -MMath::MPFR=":mpfr" -MPDL -le "for(1..10) {$r = PDL->random(
+); Rmpfr_dump(Math::MPFR->new($r.at(0)))}"
0.11101001100100000110110101110010100111100110010011000E-1
0.10110010100010010011101011001111010011110001011001001E-2
0.11101000000110110001010100000001011101100000111011111E-3
0.10110010010001101100111011100101000000001001100100011E0
0.11010000000101010100010000110000010010011110111101111E0
0.11101010010011100000100010101111011100001010000100111E-3
0.11001010100111101101000101101010111011100110010001111E-1
0.11111111111011111000010100111010010010001010101010110E-1
0.10111001110100001011011100010100001101011001101011010E-2
0.11011111011110110001110011111111011111010111110001100E-1
But you'll see the same thing if you use 15-significant-decimal-digit representations of values provided by rand():
D:\>perl -MMath::MPFR=":mpfr" -MPDL -le "for(1..10) {Rmpfr_dump(Math::
+MPFR->new(sprintf '%.15g', rand()))}"
0.11011001011111100110011011010010100010000000100011011E-3
0.10001011110110011101000101001110110001101111000000110E-1
0.11001011001111001101010100001100101010001000110000110E-2
0.11110111001110000011011110111000011000101111000111101E0
0.11110111100000100111010101111111100011010001010100010E0
0.10001110111001111100101011010111110111111111011111000E-1
0.10101100000000101101100111000011001011001010110111010E-1
0.11010110011101110001001101010011001011010001010111111E0
0.11111010001001001100100110011111101100110100011110010E-3
0.11010111110100100001010011100100000011000000000000100E-4
Yet, we already know (and you have just shown it) that rand() does not populate the 5 lowest bits.
This behaviour is inevitable when double-precision values are rounded to 15 decimal digit precision, and then assigned back in.
If 17 decimal digits of precision were provided, this capacity to mislead would go away:
D:\>perl -MMath::MPFR=":mpfr" -le "for(1..10) { Rmpfr_dump(Math::MPFR-
+>new(sprintf '%.17g', rand()))}"
0.10110100110110100010000010011000101000100000001000000E-1
0.10001101011001100001100001100010110001101111000000000E-1
0.11111010010100100101001100000101001010100010001100000E0
0.11000011010000000101111010100100110001011110010000000E-1
0.11111000011001100111110000100011000110100010101000000E-1
0.11101100001000111011110010101111011111111110000000000E-3
0.10001001000110010110010011000011100101100101011100000E0
0.10100000101110001010101010001101001011010001011000000E0
0.11100000100010000001100011100101111101100110100100000E0
0.10000000011000110110000011000010000001100000000000000E-3
Unfortunately, there's not much to be gleaned from looking at double-precision values rounded to 15-significant-decimal-digit strings.
We need to be looking at how those values were derived.
Update: Mind you, I could be barking up the wrong tree, anyway - which is even more likely if this issue you've identified is limited to threading.
Cheers, Rob | [reply] [d/l] [select] |
|
|
|
The following is a billion demonstration. Workers output directly to STDOUT and orderly via MCE::Relay.
Update: Reduce RAM usage by processing a sequence of numbers instead (~ 45MB per worker). For predictable output, this requires seeding the generator per each sequence. Previously, the code spawned 100 workers and 1e7 loop iterations (~ 24GB RAM).
use v5.030;
use PDL;
use MCE 1.895;
use Math::Prime::Util;
use Math::Random;
use Math::Random::MT::Auto;
CORE::srand(42); # This also, for MCE predictable results
# MCE sets internal seed = CORE::random()
PDL::srandom(42); # PDL::srand(N) v2.062 ~ v2.089
Math::Prime::Util::srand(42);
Math::Random::random_set_seed(42, 42);
Math::Random::MT::Auto::set_seed(42);
MCE->new(
max_workers => MCE::Util::get_ncpu(),
chunk_size => 1,
init_relay => 0,
posix_exit => 1,
sequence => [ 1, 1000 ],
user_func => sub {
# my ($mce, $seq, $chunk_id) = @_;
my $seq = $_;
my $output = "";
# Worker seeds generator using MCE's seed and wid value.
# For sequence of numbers, compute similarly using $seq value.
my $seed = abs(MCE->seed - ($seq * 100000)) % 1073741780;
CORE::srand($seed);
# PDL::srand($seed); # PDL v1.062 ~ 1.089
PDL::srandom($seed); # PDL v1.089_01
Math::Prime::Util::srand($seed);
Math::Random::random_set_seed($seed, $seed);
Math::Random::MT::Auto::set_seed($seed);
my $prngMT = Math::Random::MT::Auto->new();
$prngMT->srand($seed);
for ( 1 .. 1e6 ) {
# my $r = CORE::rand();
# my $r = PDL->random();
# my $r = Math::Prime::Util::drand();
# my $r = Math::Prime::Util::irand64();
# my $r = Math::Random::random_normal();
# my $r = Math::Random::random_uniform();
# my $r = Math::Random::MT::Auto::rand();
my $r = $prngMT->rand();
$output .= "$r\n";
}
MCE::relay { print $output; };
}
)->run;
Time to run:
$ time perl test.pl > out # 17GB ~ 19GB file size
CORE::rand . . . . . . . . . . . 12.275s
PDL->random . . . . . . . . . . . 2m04.810s
Math::Prime::Util::drand . . . . 13.358s
Math::Prime::Util::irand64 . . . 12.462s
Math::Random::random_normal . . . 16.341s
Math::Random::random_uniform . . 16.328s
Math::Random::MT::Auto::rand . . 15.040s
$prngMT->rand . . . . . . . . . . 13.339s
$ wc -l out
1000000000
GNU sort lacks parallel capability processing STDIN. See mcesort (GitHub Gist). Another option is GNU parallel parsort.
# Sorting consumes 17GB ~ 19GB temporary space.
# Specify a path with sufficient storage.
$ perl test.pl | LC_ALL=C sort -T/dev/shm -u | wc -l
$ perl test.pl | LC_ALL=C mcesort -T/dev/shm -j75% -u | wc -l
$ perl test.pl | LC_ALL=C parsort -T/dev/shm --parallel=12 -u | wc -l
CORE::rand . . . . . . . . . . . 1000000000
PDL->random . . . . . . . . . . . 999999510
Math::Prime::Util::drand . . . . 999999553
Math::Prime::Util::irand64 . . . 1000000000
Math::Random::random_normal . . . 816337225
Math::Random::random_uniform . . 799455502
Math::Random::MT::Auto::rand . . 999999527
$prngMT->rand . . . . . . . . . . 999999527
The mcesort variant runs on Linux including macOS, FreeBSD, and derivatives. | [reply] [d/l] [select] |
|
|
This indicates that random() generates fewer random bits than rand(). Right ??
I don't think so. Looking at the produced values was a surprise.
Took the threads loop and ran it with 16 pids and 1024 random() calls.
Used "%.64f" "%.72f" as format for the generated numbers.
In most cases, different runs produce identical results.
<Update>
For completeness, here is the script.
#!/usr/bin/perl
use v5.030;
use threads;
use threads::shared;
use PDL;
BEGIN { $PDL::no_clone_skip_warning = 1; }
my $lock : shared = 0;
PDL::srand(3); # PDL 2.087
for my $tid (1..16) {
threads->create(sub {
my $output = "";
for (1..1024) {
my $r = random();
$output .= sprintf "%.72f\n", $r;
}
lock $lock;
print $output;
});
}
$_->join for threads->list;
</Update>
When a run produces a different result, I see two scenarios:
- One or two numbers are missing and are replaced by an already produced value. See examples 2 and 3 below.
- The generated numbers are significantly different. See example 4.
$ ./pdl-rand.pl | sort >sort.16384.1
$ ./pdl-rand.pl | sort >sort.16384.2
$ ./pdl-rand.pl | sort >sort.16384.3
$ ./pdl-rand.pl | sort >sort.16384.4
$ sort -u sort.16384.1 |wc -l
16384
$ sort -u sort.16384.2 |wc -l
16383
$ sort -u sort.16384.3 |wc -l
16382
$ sort -u sort.16384.4 |wc -l
16383
$ diff -U 1 sort.16384.1 sort.16384.2
--- sort.16384.1 2024-06-08 09:45:09.466739327 +0200
+++ sort.16384.2 2024-06-08 09:50:24.203356598 +0200
@@ -12562,2 +12562,3 @@
0.7698795891537290048134423159353900700807571411132812500000000000000
+00000
+0.7698795891537290048134423159353900700807571411132812500000000000000
+00000
0.7699350058887169945265327442029956728219985961914062500000000000000
+00000
@@ -14417,3 +14418,2 @@
0.8818249357757870221519169717794284224510192871093750000000000000000
+00000
-0.8818467386417550013533173114410601556301116943359375000000000000000
+00000
0.8818802323789649566521120505058206617832183837890625000000000000000
+00000
$ diff -U 1 sort.16384.1 sort.16384.3
--- sort.16384.1 2024-06-08 09:45:09.466739327 +0200
+++ sort.16384.3 2024-06-08 09:52:32.837243809 +0200
@@ -13113,2 +13113,3 @@
0.8005079965106629558135864499490708112716674804687500000000000000000
+00000
+0.8005079965106629558135864499490708112716674804687500000000000000000
+00000
0.8007896602698270083209308722871355712413787841796875000000000000000
+00000
@@ -13686,3 +13687,2 @@
0.8361851909041919661547126452205702662467956542968750000000000000000
+00000
-0.8361968170640460273901339860458392649888992309570312500000000000000
+00000
0.8364519442006479454931877626222558319568634033203125000000000000000
+00000
@@ -14417,3 +14417,2 @@
0.8818249357757870221519169717794284224510192871093750000000000000000
+00000
-0.8818467386417550013533173114410601556301116943359375000000000000000
+00000
0.8818802323789649566521120505058206617832183837890625000000000000000
+00000
@@ -16095,2 +16094,3 @@
0.9834468387403769717991508514387533068656921386718750000000000000000
+00000
+0.9834468387403769717991508514387533068656921386718750000000000000000
+00000
0.9837232694387190168328061190550215542316436767578125000000000000000
+00000
$ diff -U 1 sort.16384.1 sort.16384.4|head -12
--- sort.16384.1 2024-06-08 09:45:09.466739327 +0200
+++ sort.16384.4 2024-06-08 09:58:27.630459409 +0200
@@ -1,154 +1,161 @@
-0.0000798345195128779969639606917120033813262125477194786071777343750
+00000
+0.0000273717076871849010000561919220274376129964366555213928222656250
+00000
+0.0000550196742872266023331902229376311197484028525650501251220703125
+00000
0.0001113510803039119965475445273028753945254720747470855712890625000
+00000
-0.0001282989599132870110840404231922207145544234663248062133789062500
+00000
0.0002144808569095600127036443938166598854877520352602005004882812500
+00000
-0.0003265905868672370191731213484587215134524740278720855712890625000
+00000
0.0003745585362475710206227319520877472314168699085712432861328125000
+00000
-0.0004606498956905009760365299342765865731053054332733154296875000000
+00000
$ diff -U 0 sort.16384.1 sort.16384.4|grep -v '@@'|wc -l
19698
I have no idea what's going on here but maybe someone else has.
Update: Fixed manually modified file names in diff output.
Greetings, 🐻
$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
| [reply] [d/l] [select] |
|
Predictability Summary:
non-threads
# Processes
- CORE
CORE::srand(N);
CORE::rand();
- PDL
CORE::srand(N); # This also, for MCE predictable results
PDL::srand(N); # PDL v1.062 ~ v1.089
PDL::srandom(N); # PDL v1.089_01
PDL->random; # MCE v1.895
- Math::Prime::Util
CORE::srand(N); # This also, for MCE predictable results
Math::Prime::Util::srand(N);
Math::Prime::Util::drand();
Math::Prime::Util::irand64();
- Math::Random
CORE::srand(N); # This also, for MCE predictable results
Math::Random::random_set_seed(N, N);
Math::Random::random_normal();
Math::Random::random_uniform();
- Math::Random::MT::Auto
CORE::srand(N); # This also, for MCE predictable results
Math::Random::MT::Auto::set_seed(N);
Math::Random::MT::Auto::rand();
threads
# Threads (matching non-threads output), requires MCE v1.894
- CORE
CORE::srand(N);
CORE::rand();
- Math::Prime::Util
CORE::srand(N); # This also, for MCE predictable results
Math::Prime::Util::srand(N);
Math::Prime::Util::drand();
Math::Prime::Util::irand64();
- Math::Random::MT::Auto
CORE::srand(N); # This also, for MCE predictable results
Math::Random::MT::Auto::set_seed(N);
Math::Random::MT::Auto::rand();
- Not possible
PDL or Math::Random
Running threads, I'm unable to generate predictable results using PDL or Math::Random (random_normal/random_uniform), regardless seeding the generator per each thread.
See the complete demonstration here. Run Perl with -Mthreads to spin threads. | [reply] [d/l] [select] |
|
|
|
Re^10: PDL and srand puzzle
by etj (Priest) on Jun 07, 2024 at 09:24 UTC
|
I'd have expected calling srandom inside new threads to behave better. Could you take a look at the srand code and see if anything is obviously wrong? Also, are you able to see if the duplicates are in groups i.e. sequences? | [reply] |
|
See my reply to Rob. In short, calling PDL::srandom before spawning workers has no effect. The workaround is to call CORE::srand instead.
See also, non-thread testing.
Edit: MCE checks for PDL::Primitive->can('srand'), but missed checking PDL::Primitive->can('srandom'). Resolved in MCE v1.894 and MCE::Shared v1.889.
| [reply] [d/l] [select] |
|
Calling srand cannot be a workaround for this, because there is no interaction between Perl's RNG and PDL's. They are separate systems.
| [reply] |
|
|