Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: Parallel::Iterator to get multiple pages

by afoken (Parson)
on Aug 25, 2013 at 10:29 UTC ( #1050854=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Parallel::Iterator to get multiple pages
in thread Parallel::Iterator to get multiple pages

I've been all over the Parallel::Iterator documentation

So someone must have silently removed this part from your copy of the documentation:

How It Works

The current process is forked once for each worker. Each forked child is connected to the parent by a pair of pipes. The child's STDIN, STDOUT and STDERR are unaffected.

Input values are serialised (using Storable) and passed to the workers. Completed work items are serialised and returned.

Caveats

Parallel::Iterator is designed to be simple to use - but the underlying forking of the main process can cause mystifying problems unless you have an understanding of what is going on behind the scenes.

Worker execution enviroment

All code apart from the worker subroutine executes in the parent process as normal. The worker executes in a forked instance of the parent process. That means that things like this won't work as expected:

my %tally = (); my @r = iterate_as_array( sub { my ($id, $name) = @_; $tally{$name}++; # might not do what you think it does return reverse $name; }, @names ); # Now print out the tally... while ( my ( $name, $count ) = each %tally ) { printf("%5d : %s\n", $count, $name); }

Because the worker is a closure it can see the %tally hash from its enclosing scope; but because it's running in a forked clone of the parent process it modifies its own copy of %tally rather than the copy for the parent process.

That means that after the job terminates the %tally in the parent process will be empty.

In general you should avoid side effects in your worker subroutines.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)


Comment on Re^3: Parallel::Iterator to get multiple pages
Download Code
Re^4: Parallel::Iterator to get multiple pages
by Elwood1 (Initiate) on Aug 25, 2013 at 10:32 UTC
    clear as mud. That does not explain to me why the expected variables are not passed to the worker.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1050854]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (14)
As of 2014-09-30 12:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (368 votes), past polls