Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^3: RFC: 100 PDL Exercises (ported from numpy)

by marioroy (Vicar)
on Sep 03, 2019 at 05:15 UTC ( #11105497=note: print w/replies, xml ) Need Help??


in reply to Re^2: RFC: 100 PDL Exercises (ported from numpy)
in thread RFC: 100 PDL Exercises (ported from numpy)

Hi, vr

Tonight came across your post and modified your demonstration to run with 4 threads.

# https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; use PDL::Parallel::threads qw(retrieve_pdls); use threads; use MCE::Shared; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random( $n ); $x->share_as('x'); my $avg = zeroes( $m ); $avg->share_as('avg'); my $seq = MCE::Shared->sequence( 0, $m - 1 ); sub parallel_task { srand; my ( $x, $avg ) = retrieve_pdls('x', 'avg'); while ( defined ( my $seq_n = $seq->next() ) ) { my $idx = random $r; $idx *= $n; $avg->set( $seq_n, $x->index( $idx )->avg ); } } threads->create( \&parallel_task ) for 1 .. 4; # ... do other stuff ... $_->join() for threads->list(); say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49395242 0.49936752] 1.28744792938232 seconds

Afterwards, re-validated PDL with MCE and released 1.847. The effort is mainly for folks running Perl lacking threads support. Here it is, PDL and MCE::Shared running similarly.

# https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; # must load PDL before MCE::Shared use MCE::Hobo; use MCE::Shared 1.847; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size # On Windows, the non-shared piddle ($x) is unblessed in threads. # Therefore, constructing the piddle inside the worker. UNIX # platforms benefit from copy-on-write. Thus, one copy. my $x = ( $^O eq 'MSWin32' ) ? undef : random( $n ); my $avg = MCE::Shared->pdl_zeroes( $m ); my $seq = MCE::Shared->sequence( 0, $m - 1 ); sub parallel_task { $x = random( $n ) unless ( defined $x ); while ( defined ( my $seq_n = $seq->next() ) ) { my $idx = random $r; $idx *= $n; # $avg is a shared piddle which resides inside the shared- # manager process or thread. The piddle is accessible via the # OO interface only. $avg->set( $seq_n, $x->index( $idx )->avg ); } } MCE::Hobo->create( \&parallel_task ) for 1 .. 4; # ... do other stuff ... MCE::Hobo->wait_all(); # MCE sets the seed of the base generator uniquely between workers. # Unfortunately, it requires running with one worker for predictable # results (i.e. no guarantee in the order which worker computes the # next input chunk). say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49387191 0.49937053] 1.29038286209106 seconds

Regards, Mario

Replies are listed 'Best First'.
Re^4: RFC: 100 PDL Exercises (ported from numpy)
by marioroy (Vicar) on Sep 03, 2019 at 05:25 UTC

    Here is the same thing using MCE. Workers obtain the next sequence number without involving the manager process. Thus, the reason why it runs faster. I had to think about it when I saw the run time.

    # https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; # must load PDL before MCE::Shared use MCE 1.847; use MCE::Shared 1.847; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size # On Windows, the non-shared piddle ($x) is unblessed in threads. # Therefore, constructing the piddle inside the worker. UNIX # platforms benefit from copy-on-write. Thus, one copy. my $x = ( $^O eq 'MSWin32' ) ? undef : random( $n ); my $avg = MCE::Shared->pdl_zeroes( $m ); MCE->new( max_workers => 4, sequence => [ 0, $m - 1 ], chunk_size => 1, user_begin => sub { $x = random( $n ) unless ( defined $x ); }, user_func => sub { my $idx = random $r; $idx *= $n; # $avg is a shared piddle which resides inside the shared- # manager process or thread. The piddle is accessible via the # OO interface only. $avg->set( $_, $x->index( $idx )->avg ); } )->run; # MCE sets the seed of the base generator uniquely between workers. # Unfortunately, it requires running with one worker for predictable # results (i.e. no guarantee in the order which worker computes the # next input chunk). say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49387106 0.4993768] 1.09556317329407 seconds

    Thank you, vr. I had no idea that PDL random is not unique between threads. MCE already sets the seed of the base generator, but did not do so for workers spawned as threads. This is resolved in MCE 1.847.

    Regards, Mario

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11105497]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2019-12-10 20:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?