Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^8: PDL and srand puzzle - testing using MCE

by marioroy (Prior)
on Jun 06, 2024 at 06:06 UTC ( [id://11159813]=note: print w/replies, xml ) Need Help??


in reply to Re^7: PDL and srand puzzle - prior reply not using MCE
in thread PDL and srand puzzle

I'm taking CORE::rand() and PDL::random() for a spin without threads. Rather, child processes. There are 8 workers, each output 50,000 lines. A count below 400,000 indicates duplicates in the output.

use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 8, user_func => sub { for (1..50000) { # my $r = CORE::rand(); my $r = PDL->random; MCE->say("$r"); } } )->run;

CORE::rand()

$ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

PDL->random

$ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

Next, I tried 12 million unique lines and tight loop by appending to a string (i.e. no waiting for serialized output previously). Again, no duplicates.

use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 24, user_func => sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } MCE->print($output); } )->run;

CORE::rand() and PDL->random

$ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000

Sorting takes a while. There is the mcesort program with integrated mini-MCE. Copy the script to /usr/local/bin and sudo chmod +x /usr/local/bin/mcesort or bin path of your choice.

perl test5.pl | LC_ALL=C mcesort -j6 -u | wc -l

Replies are listed 'Best First'.
Re^9: PDL and srand puzzle - testing using threads
by marioroy (Prior) on Jun 06, 2024 at 07:34 UTC

    The following uses threads for comparison. Locking is required to not garble output, handled automatically i.e. MCE->say, MCE->print, MCE->printf. Here, a count below 32 million indicates duplicate lines in the output.

    Edit: etj identified a race condition, hence less uniqueness.

    use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; for my $tid (1..64) { threads->create(sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } lock $lock; print $output; }); } $_->join for threads->list;

    CORE::rand()

    $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000 $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000 $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000

    PDL->random

    $ perl test6.pl | LC_ALL=C sort -u | wc -l 25105304 $ perl test6.pl | LC_ALL=C sort -u | wc -l 25304231 $ perl test6.pl | LC_ALL=C sort -u | wc -l 25290392

    Improving sort

    Mentioned in my prior post, the parallel mcesort program with integrated mini-MCE resides in a GitHub Gist. Copy the script to /usr/local/bin and sudo chmod +x /usr/local/bin/mcesort or bin path of your choice.

    perl test6.pl | LC_ALL=C mcesort -j6 -u | wc -l

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11159813]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2025-02-09 15:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (96 votes). Check out past polls.