Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Wanting some clarification / opinions on MCE vs Threads

by jmmitc06 (Beadle)
on Feb 05, 2015 at 07:27 UTC ( [id://1115612]=perlquestion: print w/replies, xml ) Need Help??

jmmitc06 has asked for the wisdom of the Perl Monks concerning the following question:

I frequently use multi-processing in Perl for the scripts I write at work. I have always used the threads and Threads::Queue module for these scripts. I have been finding references to the many core engine (MCE) online and in the monastery but I'm still not sure when it is best to use MCE or Threads or some other form of multiprocessing.

From what I have gathered, MCE uses the Thread module if it is installed and can also fork processes. Is MCE just a more automated way of implementing multi-processed jobs than explicitly writing code using threads or does it do unique things that threads cannot? I assume that since MCE can use threads to do that multiprocessing voodoo it does so well, that there is not a significant performance penalty for using MCE? It does seem that MCE code is more readable than the equivalent threads code which is already a big plus.

In short, I would love some input on MCE as a whole and how it compares to threads. Would it be good practice to start using MCE instead of threads?

Replies are listed 'Best First'.
Re: Wanting some clarification / opinions on MCE vs Threads
by BrowserUk (Patriarch) on Feb 05, 2015 at 07:53 UTC
    It does seem that MCE code is more readable than the equivalent threads code which is already a big plus.

    Do you have links to one or more pairs of comparative examples to back that up?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      I haven't seen any direct comparisons between the two but I think MCE::Grep and MCE::Map are two examples of MCE code that is more readable than the threads equivalent (I should say my threads equivalent)

      use MCE::Grep; my @a = mce_grep { $_ % 5 == 0 } 1..10000;
      versus
      use threads; use Thread::Queue; my @thread_pool; my $q = Thread::Queue->new(); my $results = Thread::Queue->new(); for (0..10000) { $q->enqueue($_); } for (0..1) { push @thread_pool, threads->create( \&grep ); } sub grep { while (my $work = $q->dequeue() ) { if ( $work % 5 == 0 ) { $results->enqueue($work); } } $q->enqueue(undef); } map {$_->join(); } (@thread_pool); $results->enqueue(undef); my @results; while ( my $result = $results->dequeue() ) { print $result, "\n"; push @results, $result; }

      I'm sure that the threads version could be done much more easily than I hacked together in 5 minutes. It could just be that how I write using threads is just poor. Regardless, I don't think there is a threads implementation as simple as the MCE version. Additionally, this is also a special case where MCE has a built-in function that provides this functionality, but there are similar constructs for most of the simple cases. For what I do there isn't much that can't be implemented using some mixture of MCE::Grep, MCE::Map and MCE::Loop so I'm biased.

      I should also note that I haven't written very much (no "production" code as it were) using MCE so I may not have encountered some of its limitations compared to threads.

        (I should say my threads equivalent)

        Hm. No offence but, Ew! :)

        Remember that MCE is a wrapper (actually a suite of wrappers, or is that sweet wrapper:) over the top of threads(and other things), providing syntactic sugar for simple operations.

        You can easily do the same yourself. Say write TGrep.pm:

        Then write:

        #! perl -slw use strict; use TGrep; use Time::HiRes qw[ time ]; our $N //= 1e3; my $start = time; my @a = tgrep{ $_ % 5 == 0 } [ 1..$N ]; printf "Took %.9f seconds\n", time() -$start; print scalar @a; __END__ C:\test>t-tgrep -N=1e5 Took 3.830074787 seconds 20000

        Of course, you'd probably be better sticking to grep for such simple things:

        $t=time; my @a = grep{ $_%5 == 0 } 1 .. 1e6; print time-$t;; 0.18144702911377

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        ? @{ $_ } : $_ } @_ ); $Qin-

        Sure, threads could be extended with some convenience functions

        #!/usr/bin/perl -- use strict; use warnings; use threads ; use Thread::Queue; Main( @ARGV ); exit( 0 ); sub threads_grep(&@) { my $cb = shift; my $max = int @_; my $args = ref $_[0] ? shift : { slaves => 4, maxdq => $max * 10 +00 }; my $slaves = $args->{slaves} || 4; my $maxdq = $args->{maxdq} || 1e9; my $qin = Thread::Queue->new( @_, ( undef ) x $slaves ); my $qout = Thread::Queue->new(); my @kids = map { threads->create( sub { ## threads_grep_cb my( $cb, $qin, $qout ) = @_; local $_; while( $_ = $qin->dequeue ) { if( $cb->() ) { $qout->enqueue( $_ ); } } warn 'tids ahoy ', threads->tid; return; }, $cb, $qin, $qout ); } 1 .. $slaves; $_->join for @kids; $qin->end; $qout->end; return $qout->dequeue( $maxdq ); } ## end sub threads_grep(&@) sub Main { #~ my @res = threads_grep { $_ % 5 == 0 } 1..1000; my @res = threads_grep { $_ % 5 == 0 } { slaves => 2 }, 1..1000; print "@res\n"; } ## end sub Main __END__ tids ahoy 1 at - line 26. tids ahoy 2 at - line 26. 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 11 +5 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 260 26 +5 270 275 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 360 365 370 375 380 385 390 395 400 405 410 41 +5 420 425 430 435 440 445 450 455 460 465 470 475 480 485 490 495 500 505 510 515 520 525 530 535 540 545 550 555 560 56 +5 570 575 580 585 590 595 600 605 610 615 620 625 630 635 640 645 650 655 660 665 670 675 680 685 690 695 700 705 710 71 +5 720 725 730 735 740 745 750 755 760 765 770 775 780 785 790 795 800 805 810 815 820 825 830 835 840 845 850 855 860 86 +5 870 875 880 885 890 895 900 905 910 915 920 925 930 935 940 945 950 955 960 965 970 975 980 985 990 995 1000
Re: Wanting some clarification / opinions on MCE vs Threads
by marioroy (Prior) on Feb 12, 2015 at 07:12 UTC

    MCE began life as a chunking engine with support for serialized output or action; e.g. serializing log data to a single file and not worry about many workers writing simultaneously; e.g. MCE->print($LOG_FH, "$msg\n");

    The native grep function will typically run faster for small code. Below, mce_grep has low overhead due to chunking input. Output order is also preserved (not shown).

    # $N = 1e6; TGrep......: Took 30.264018774 seconds (4 workers) mce_grep...: Took 0.299300909 seconds (4 workers) native grep: Took 0.106141806 seconds

    One reason for using MCE is wanting the freezing and thawing of data done automatically between the manager process and workers or vice versa. Another likely reason is running MCE with AnyEvent or Mojo and benefitting from chunking; e.g. each worker receives 300 hosts or URLs at a time and processing the chunk with desired event loop.

    MCE::Queue is not necessary when threads is desired. One can still use Thread::Queue unless wanting priority queues possible with MCE::Queue. Perhaps, Perl is not built with threads support (common on some platforms). Both MCE::Queue and MCE::Mutex support threads and processes.

    The next update will include Tutorial.pod demonstrating parallelism for various CPAN modules.

Re: Wanting some clarification / opinions on MCE vs Threads
by marioroy (Prior) on Feb 12, 2015 at 08:28 UTC

    For the curious, here is a version of TGrep.pm by BrowserUk modified to use MCE. Simply remove or comment out "use threads" if Perl lacks support for threads. Threads is not necessary for MCE::Queue.

    package TGrep; use strict; use threads; use MCE; use MCE::Queue; our $WORKERS = 4; sub tgrep(&@) { my $workers = $WORKERS; my $code = shift; my @results; my $Qin = MCE::Queue->new( fast => 1 ); my $Qout = MCE::Queue->new( queue => \@results ); my $mce = MCE->new( max_workers => $workers, user_func => sub { $Qout->enqueue( $code->() ? $_ : () ) while local $_ = $Qin- +>dequeue; } )->spawn; $Qin->enqueue( map{ ref $_[0] ? @{ $_ } : $_ } @_ ); $Qin->enqueue( (undef) x $workers ); $mce->run; return wantarray ? @results : \@results; } sub import { no strict 'refs'; my $pkg = caller; *{ $pkg . '::' . $_ } = *{ $_ } for qw[ tgrep ]; } 1;

    Thus, new results emerges.

    # $N = 1e6 (4 workers) TGrep....: Took 30.264018774 seconds TGrep MCE: Took 15.292614937 seconds (fast => 0) TGrep MCE: Took 7.824651003 seconds (fast => 1)

    I was curious about how MCE::Queue compares to Thread::Queue for the demonstration. The fast option is beneficial for an already populated queue.

      I was curious about how MCE::Queue compares to Thread::Queue for the demonstration.

      Thread::Queue, and indeed anything involving threads::shared is very slow. This is a side effect of the latter's peculiar implementation.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1115612]
Approved by Discipulus
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-26 00:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found