http://www.perlmonks.org?node_id=1216555


in reply to Re^4: Getting started with MCE (the Many-Core Engine)
in thread Getting started with MCE (the Many-Core Engine)

Greetings,

Some helpful tips for processing a large array.

Spawn workers early before creating or obtaining a large array to be used as input data. Dividing the work equally by the number of workers is not recommended for large data sets. A chunk_size value of 4000 or 8000 is fine for large arrays. It doesn't take much (chunk_size wise) for IPC to not become the bottleneck. Finally, workers persist after processing (re: $mce->process). Thus, shutdown workers when completed. This is done for you when the script terminates if omitted.

#!/usr/bin/perl use strict; use warnings; use MCE; use MCE::Candy; my $volume = 26*26; my $max_workers = 4; my $chunk_size = int $volume / $max_workers / 16; my @results; my $mce = MCE->new( max_workers => $max_workers, chunk_size => $chunk_size, gather => MCE::Candy::out_iter_array(\@results), user_func => sub { my ($mce, $chunk_ref, $chunk_id) = @_; my @output; foreach my $item (@{ $chunk_ref }) { push @output, $item++; } $mce->gather($chunk_id, @output); } )->spawn; $mce->process([ 'aa' .. 'zz' ]); $mce->shutdown; print "$_, " for @results; print scalar @results, "\n";

Regards, Mario

Replies are listed 'Best First'.
Re^6: Getting started with MCE (the Many-Core Engine)
by Anonymous Monk on Jun 13, 2018 at 15:30 UTC
    Some helpful tips for processing a large array.

    Greetings Mario. What a marvelous coincidence. I was just writing a question on both topics you address. First, thank you for the free software! MCE is amazing.

    I've been tweaking it and noticed that smaller chunks are faster. For some reason 11k is the fastest here. I also noticed some delay in shutdown due to not calling shutdown but didn't know what was wrong. Thanks for the clue. Another issue I've come across is the value of MCE::Util::get_ncpu. On my machine (i7) it says 8 and of course it spawns 8 helpers but for some reason it's faster when I set that to 4:

    > time mce volume: 11881376 chunk_size: 16000 max_workers: 8 real 0m19.776s user 0m43.350s sys 0m10.321s > time mce volume: 11881376 chunk_size: 16000 max_workers: 4 real 0m17.615s user 0m27.495s sys 0m4.836s

      Greetings,

      Depending on the processor, not all cores are "real" cores. Regarding chunk_size, I typically do not go over 8k for a large array and let chunking do its thing. Spawning workers early is beneficial, prior to allocating large amount of memory for input data.

      max_workers => MCE::Util::get_ncpu() / 2,

      Regards, Mario