Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Please Explain the Parallel::ForkManager Idiom my $pid = $pm->start and next;

by Jim (Curate)
on Feb 04, 2014 at 07:37 UTC ( #1073329=perlquestion: print w/replies, xml ) Need Help??
Jim has asked for the wisdom of the Perl Monks concerning the following question:

I simply cannot make sense of the Parallel::ForkManager idiom my $pid = $pm->start and next; as the first statement within a foreach loop that iterates a list of items. I've studied and run examples from the module's perldoc page over and over again, but I just can't wrap my head around how the idiom works. I've even read this explanation and I still don't get it.

How does the first item in whatever the foreach loop is iterating not get ignored? It's skipped—and yet it's somehow not skipped. I can't figure out how it's not skipped. It's maddening. Please explain it to me if you can.


Replies are listed 'Best First'.
Re: Please Explain the Parallel::ForkManager Idiom my $pid = $pm->start and next;
by hdb (Prior) on Feb 04, 2014 at 07:47 UTC

    When $pm->start returns, two processes exist, the parent and the child. In the parent process, $pm->start returns the process id which is a non-zero value (ie true) and thus and executes the next command. In the child process, $pm->start returns 0 (ie false) and thus and short-circuits (if the first argument is false already, and returns false always without evaluating the second argument). This way, next is not executed but the remaining body of the loop.

    So, you are correct, next is skipped (in the child) and not skipped (in the parent).

      Thanks. I still don't get it. I'm having a more profound problem than just not understanding the idiom. I don't understand the ubiquitous expressions I read over and over again in the documentation and in PerlMonks tutorials and posts:  "…in the parent process…", "…in the child process…", etc. These make no sense to me. And running the example scripts and observing their behavior isn't helping, but instead making it worse because the behavior is utterly counterintuitive to me. The particular example code I'm running from the module's perldoc page has sleep()s in it that I know the code is reaching, but are never actually happening. There's never a pause in the execution of the program. It blows right past the sleep()s.

        Mh, imho the example with the callbacks is very nice and instructive. Did you run it and take a look at watched top during execution?

        Regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Re: Please Explain the Parallel::ForkManager Idiom my $pid = $pm->start and next;
by Anonymous Monk on Feb 04, 2014 at 08:07 UTC

      Gosh, I remember reading bart's excellent tutorial Mr. Peabody Explains fork() years ago when fork() was relatively new to Perl on Windows. Obviously, I wasn't able to wrap my head around it then, and I continue to struggle with it today. But I'm more determined now than I was then to understand how it works.

      At least one light bulb has gone on above my head after reading this and seeing its accompanying graphic in blue and green text:

      What fork() does is extrordinary! It takes the existing process and clones it. If you're familiar with Star Trek, this is like a bad transporter accident. An exact copy is made of each process, and each process is almost unaware of the other.

      The code is executed instruction-by-instruction until the fork() call is reached. At that point, the process is cloned and now there are two identical processes running the instruction right after the fork().

      In fact, they're so identical that only one bit of information distinguishes between them: the return value of fork(). In the child process fork() appears to have returned 0. In the parent process, fork() appears to have returned a non-zero integer. This is how parent and child tell themselves apart.

      The important difference here is the explanation that "the code is executed instruction-by-instruction until the fork() is reached." This is the first time I've seen it plainly stated that fork() alters the ordinary sequential execution of statements in a Perl program. This helps me begin to understand what's going on in the mystifying example program I'm studying.

      I likened fork() in Perl to job control in the Unix shell, but I'm now realizing they're not the same thing at all. A shell loop that launches commands in the background with & is not the same as a fork() in Perl.


        Here is a forking program expressed as a tree (this is parent)

        • code before fork
        • fork call
        • code after fork

        The child process execution is

        • fork call
        • code after fork

        When the fork call is reached, the parent process creates a child process (clone), and returns a pid for the new child process

        The child process doesn't clone/create a new child process (only parent does that), and fork returns zero

        After that the two processes are identical, and each of them continues execution from the point of fork ... they both execute code after fork

        The child doesn't start from beginning to execute code before fork (its not a new process, its a clone of the parent)


        When you run the program here is the execution order as a tree

        • before fork
        • fork() ....
            • parent clones child, and gets not-zero (pid of child)
            • code after fork is executed
            • child gets zero (only parent forks/clones itself)
            • code after fork is executed

        Or the same in table tree form :) the parent process is started and it runs
        code before fork
        fork call (I parent clone child here) I child gets 0 and I run in parallel
        code after fork code after fork

        plainly stated that fork() alters the ordinary sequential execution of statements

        No. It doesn't state that. It emphasizes that both before and after the fork() there is no difference except for fork's return value.

        The only alteration is: after the fork(), there are two identical copies of those instructions. If there were is no if around the fork() (or you save the return value to a variable and evaluate it afterwards), you would not be able to distinguish between them. The "ordinary sequential execution" is not altered.

Re: Please Explain the Parallel::ForkManager Idiom my $pid = $pm->start and next;
by sundialsvc4 (Abbot) on Feb 04, 2014 at 15:10 UTC

    It can be confusing, sure.   Think of it this way:   when the parent process start()s a child process, there are, from that moment forward, two processes executing more-or-less parallel with one another.   (I say “more-or-less” because the relative timing of the two can’t be exactly predicted ...)   Nevertheless, both are executing the same Perl code, at the same point, but with exactly one very important difference:

    • In the parent, the process-ID of the new child process is returned as the value of the call to start().
    • In the child, the returned value is zero.
    • And that’s how the two of them each know which one they are.

    The “fork in the road” (heh...) therefore happens with and next.   This is shorthand, exploiting Perl’s use of “short-circuit” expression evaluation.   The parent immediately continues with the foreach loop (or ends it); the child proceeds.   (Here’s the short-circuit:   since 0 and anything is known to be false, the right side of the and clause (that is to say, the next statement) is never evaluated in the child.)

    Ordinarily, the console-output of both processes will now be intermingled on the screen, in no particular (exactly predictable) order, and yes, you will see that the child does sleep() properly, as any process would do.   If you still don’t observe that, please post a snippet of your code (remember to use <code> tags ...) so that we can point out the error of your ways.   It is a bit tricky ...

      It's not terribly important to this discussion that I mention this, but the and next bit is not part of what's confusing me. I understand this part of the idiom.

      However, instead of doing this…

      CHILD: foreach my $child ( 0 .. $#names ) { my $pid = $pm->start($names[$child]) and next; # Child process... }

      …I would typically prefer to do this…

      CHILD: for my $child ( 0 .. $#names ) { my $pid = $pm->start($names[$child]); next CHILD if $pid != 0; # Parent process # Child process... }

      I think this is more in line with PBP, but I could be mistaken.


        Why? For the most part, I don't care and don't want to know what the pids are. P::FM does the bookkeeping for me. I can do something like:
        for my $name (qw(foo bar baz)) { $pm->start($name) and next; ... }
        And then I can think of my child processes as having names 'foo', 'bar', and 'baz' and can forget about pids. The names can be used in the run_on_finish() callback if I want to.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1073329]
Approved by marto
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2017-02-24 05:15 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (351 votes). Check out past polls.