Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Using MCE to write to multiple files.

by Anonymous Monk
on Nov 22, 2014 at 03:04 UTC ( [id://1108076]=note: print w/replies, xml ) Need Help??


in reply to Using MCE to write to multiple files.

...number of output files are not consistent with the size of my input ie....what's wrong?

What is supposed to happen?

  • Comment on Re: Using MCE to write to multiple files.

Replies are listed 'Best First'.
Re^2: Using MCE to write to multiple files.
by etheleon (Novice) on Nov 22, 2014 at 03:57 UTC
    since the array @data_input held 10 elements, i should get 10 files output but i don't.

      since the array @data_input held 10 elements, i should get 10 files output but i don't.

      Um, there is @input_data, and it has 100 elements, :)

      But ok, are you using fork or threads? Try getting MCE to choose fork, then repeat your test with treads ... if there is a difference between the two, then there is a bug in one of the backends, or it hasn't finished running

      If both fork/threads backends behave the same, then there might be a problem with your logic

      I've not scrutinized your code, but  $tmp{$chunk_id} stuff looks like it will never make any difference :)

      But if you want something to compare, try

      $ perl threads-jobadder-jobworker-fibonacci.pl $ dir /b fibojob* |wc --lines 100
      #!/usr/bin/perl -- ## threads-jobadder-jobworker-fibonacci.pl ## ## ## ## ## ## perltidy -olq -csc -csci=3 -cscl="sub : BEGIN END " -otr -opr -ce +-nibc -i=4 -pt=0 "-nsak=*" ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if " -otr -opr +-ce -nibc -i=4 -pt=0 "-nsak=*" ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if while " -otr + -opr -ce -nibc -i=4 -pt=0 "-nsak=*" #!/usr/bin/perl -- use strict; use warnings; use threads stack_size => 4096; use Thread::Queue; Main( @ARGV ); exit( 0 ); sub Main { use autodie qw/ chdir /; chdir "/path/to/my/files/" ; my $maxJobs = 8; my $q = Thread::Queue->new; for( 1 .. $maxJobs ) { ## threads->create( \&JobWorker, $queue, \&TheJob ); threads->create( \&JobWorker, $q, \&GetFibby ); } threads->create( \&JobAdder, $q, $maxJobs, [ 0 .. 99 ] ); $_->join for threads->list; ## wait for threads to finish } ## end sub Main sub JobWorker { my( $q, $callBack ) = @_; while( defined( my $argRef = $q->dequeue ) ) { ## GetFibby( @$argRef ); $callBack->( @$argRef ); } return; } ## end sub JobWorker sub GetFibby { my( $chunkId ) = @_; use autodie qw/ open close /; open my( $outfh ), '>', "fibojob-$chunkId.txt"; print $outfh "\t", fibonacci( $_ ) for 1 .. 10; print $outfh "\n"; close $outfh; } ## end sub GetFibby sub JobAdder { my( $q, $maxJobs, $inputs ) = @_; while( @$inputs ) { AddJob( $q, shift @$inputs ); } SignalNoMoreJobs( $q, $maxJobs ); } ## end sub JobAdder sub SignalNoMoreJobs { my( $q, $maxJobs ) = @_; $q->enqueue( undef ) for 1 .. $maxJobs; return; } ## end sub SignalNoMoreJobs sub AddJob { my $q = shift; $q->enqueue( [@_] ); return; } ## end sub AddJob sub fibonacci { my $n = shift; return undef if $n < 0; my $f; if( $n == 0 ) { $f = 0; } elsif( $n == 1 ) { $f = 1; } else { $f = fibonacci( $n - 1 ) + fibonacci( $n - 2 ); } return $f; } ## end sub fibonacci

        Damn! That's a complicated way of doing something very simple. It took me forever to work out where $chunkIds originated.

        They start life as an integer list wrapped in a anonymous array:

        threads->create( \&JobAdder, $q, $maxJobs, [ 0 .. 99 ] );

        And get past individually to AddJob() here:

        AddJob( $q, shift @$inputs );

        Where they get pushed individually, but wrapped in individual anonymous arrays, onto the queue here:

        $q->enqueue( [@_] );

        Then those anonymous array refs get dequeued here:

        while( defined( my $argRef = $q->dequeue ) ) {

        And the individual integer are then unwrapped from them here:

        $callBack->( @$argRef );

        Before being passed to where they actually get used to number some files here:

        sub GetFibby { my( $chunkId ) = @_; use autodie qw/ open close /; open my( $outfh ), '>', "fibojob-$chunkId.txt";

        All of that to achieve the equivalent of:

        $q->enqueue( 0 .. 99 );

        Here's a simpler version that avoids the obfuscation:

        #! perl -slw use strict; use threads; use threads::Q; sub fibonacci { my $n = shift; return undef if $n < 0; my $f; if( $n == 0 ) { $f = 0; } elsif( $n == 1 ) { $f = 1; } else { $f = fibonacci( $n - 1 ) + fibonacci( $n - 2 ); } return $f; } sub worker { my $tid = threads->tid; my( $Q, $path ) = @_; chdir $path or die "$path : $!"; while( defined( my $chunkId = $Q->dq ) ) { open my $out, '>', "fibojob-$chunkId.txt" or die "$chunkId : $ +!"; print $out join "\t", map fibonacci( $_ ), 1 .. 10; close $out; } } our $T //= 8; our $P //= '.'; our $M //= 99; ## The pattern is simple my $Q = threads::Q->new( $T*2 ); ## Cr +eate a queue. my @threads = map threads->create( \&worker, $Q, $P ), 1 .. $T; ## Cr +eate some workers to read from that queue. $Q->nq( 0 .. $M ); ## Qu +eue some work. $Q->nq( ( undef ) x $T ); ## Te +ll the workers they are done. $_->join for @threads; ## An +d wait for them to finish.

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Hi etheleon,

        It seems that output order is not necessary for the use-case. The issue mentioned may be coming from say $output (not appending \n to the file for me, instead getting a GLOB message to STDOUT).

        Btw, I ran your code and it ran fine. I only changed say $output to print $output "\n"; All files had 10 results -- all identical -- all same size.

        Back to MCE, the following processes @input_data in parallel. Am using the MCE::Loop Model versus using the Core API.

        use MCE::Loop chunk_size => 1; my @input_data = (0 .. 100 - 1); mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; open my $output, '>', "/path/to/my/files/$chunk_id.txt"; foreach (1..10) { print $output "\t",fibonacci($_)}; print $output "\n"; close $output; } @input_data; sub fibonacci { my $n = shift; return undef if $n < 0; my $f; if ($n == 0) { $f = 0; } elsif ($n == 1) { $f = 1; } else { $f = fibonacci($n-1) + fibonacci($n-2); } return $f; }

        Notice how mce_loop wraps around the serial code to enable parallelism.

        Also see https://metacpan.org/pod/MCE::Loop#GATHERING-DATA if wanting to gather data back to the Manager process.

        It seems what i was looking for was the Parallel::Prefork / Parallel::ForkManager library. i wanted to process multiple input files in parallel and output them in parallel as well. I'm, when using Prefork or ForkManager does perl duplicate all my global variables for each fork?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1108076]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-25 23:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found