Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^7: PDL and srand puzzle - prior reply not using MCE

by marioroy (Prior)
on Jun 06, 2024 at 05:24 UTC ( [id://11159812]=note: print w/replies, xml ) Need Help??


in reply to Re^6: PDL and srand puzzle
in thread PDL and srand puzzle

It's possible this is because srandom's dealing with the number of CPUs available is interacting with the way MCE (or at least this use of it) does multi-threading.

The thread example given in my prior reply is not using MCE.

  • Comment on Re^7: PDL and srand puzzle - prior reply not using MCE

Replies are listed 'Best First'.
Re^8: PDL and srand puzzle - testing using MCE
by marioroy (Prior) on Jun 06, 2024 at 06:06 UTC

    I'm taking CORE::rand() and PDL::random() for a spin without threads. Rather, child processes. There are 8 workers, each output 50,000 lines. A count below 400,000 indicates duplicates in the output.

    use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 8, user_func => sub { for (1..50000) { # my $r = CORE::rand(); my $r = PDL->random; MCE->say("$r"); } } )->run;

    CORE::rand()

    $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

    PDL->random

    $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

    Next, I tried 12 million unique lines and tight loop by appending to a string (i.e. no waiting for serialized output previously). Again, no duplicates.

    use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 24, user_func => sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } MCE->print($output); } )->run;

    CORE::rand() and PDL->random

    $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000

    Sorting takes a while. There is the mcesort program with integrated mini-MCE. Copy the script to /usr/local/bin and sudo chmod +x /usr/local/bin/mcesort or bin path of your choice.

    perl test5.pl | LC_ALL=C mcesort -j6 -u | wc -l

      The following uses threads for comparison. Locking is required to not garble output, handled automatically i.e. MCE->say, MCE->print, MCE->printf. Here, a count below 32 million indicates duplicate lines in the output.

      Edit: etj identified a race condition, hence less uniqueness.

      use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; for my $tid (1..64) { threads->create(sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } lock $lock; print $output; }); } $_->join for threads->list;

      CORE::rand()

      $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000 $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000 $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000

      PDL->random

      $ perl test6.pl | LC_ALL=C sort -u | wc -l 25105304 $ perl test6.pl | LC_ALL=C sort -u | wc -l 25304231 $ perl test6.pl | LC_ALL=C sort -u | wc -l 25290392

      Improving sort

      Mentioned in my prior post, the parallel mcesort program with integrated mini-MCE resides in a GitHub Gist. Copy the script to /usr/local/bin and sudo chmod +x /usr/local/bin/mcesort or bin path of your choice.

      perl test6.pl | LC_ALL=C mcesort -j6 -u | wc -l
Re^8: PDL and srand puzzle
by etj (Priest) on Jun 06, 2024 at 14:21 UTC
    So I think what's happening is I recommended that srandom be called before creating any new threads at all, and you've ignored that and continue to have the same symptoms as before?

      I understood your recommendation to not call srandom inside threads. Basically, I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand(), regardless if calling srandom before spawning threads.

      Edit: etj identified a race condition, hence less uniqueness.

      use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; srandom(3); # PDL 2.089_01 for my $tid (1..16) { threads->create(sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } lock $lock; print $output; }); } $_->join for threads->list;

      CORE::rand()

      $ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000

      PDL->random

      $ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7507936 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785
        I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand()

        This indicates that random() generates fewer random bits than rand(). Right ??

        With perl, $Config{randbits} should report the number of random bits being generated by rand() - which is 48 on my SP-5.38.2.
        Is the comparable value for PDL's random() function readily available ?

        Cheers,
        Rob
        I'd have expected calling srandom inside new threads to behave better. Could you take a look at the srand code and see if anything is obviously wrong? Also, are you able to see if the duplicates are in groups i.e. sequences?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11159812]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2025-02-09 14:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (96 votes). Check out past polls.