Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Introduction to Parallel::ForkManager

Introduction

The goal of this tutorial is to demonstrate how to use Parallel::ForkManager, a simple and powerful Perl module available from CPAN. Parallel::ForkManager is a simple and powerful module that can be used to perform a series of operations in parallel within a single Perl script. It is especially well-suited to performing a number of repetitive operations on a relatively powerful machine, especially when working on a multiprocessor machine. This module uses object-oriented syntax, if that frightens you then should read some of the Object Oriented Perl tutorials.

Usage

One caveat to using Parallel::ForkManager is that you must instantiate the Parallel::ForkManager object with a number representing the maximum number of processes to fork. Here is an example of the syntax:

my $manager = new Parallel::ForkManager( 20 );

In many cases, this maximum number of processes to fork will also be the actual number of processes forked by your program. In this case, it is very important to choose this number carefully, as forking a large enough number of processes is enough to bring even the mightiest of machines to it's knees. Also, you can change this number later in your program as needed with the following method:

$manager->set_max_procs( $newMaximumProcs );

After instantiating a Parallel::ForkManager object, you can start forking processes using the start method. It is important to also define the point at which the child processes will finish. This is usually performed within a for or while loop, so the syntax will look like this:

foreach my $command (@commands) { $manager->start and next; system( $command ); $manager->finish; };

The line within the for loop is a common idiom used for Parallel::ForkManager, it starts running the command via a forked process and advances to the next command in the @command array. The start method takes an optional parameter named $process_identifier, which can be used in callbacks (see Callbacks section).

Another useful method in the Parallel::ForkManager class is the wait_all_children method. It performs a blocking wait on the parent program that waits until all forked processes have finished.

Callbacks

It is possible to define callbacks to child processes, which are blocks of code that are called at various points of the execution of your processes. There are three forms of callbacks:

  • run_on_start - run when each process is started
  • run_on_finish - run when each process is finished
  • run_on_wait - run when a process needs to wait for startup
Callbacks are defined using the run_on_start, run_on_finish, and run_on_wait methods, which take subroutines (or references to subroutines) as arguments. The arguments provided to the subroutine differ depending on which form of callback you are defining.

Here's an example of the run_on_start method:

$manager->run_on_start( sub { my ($pid,$ident) = @_; print "Starting processes $ident under process id $pid\n"; } );

The arguments passed to the run_on_start sub are the process id of the forked process (provided by the operating system), and an identifier for the process that can be defined in the start method of the Parallel::ForkManager process. You should remember this in case that you don't provide an identifier in the call to start, this will make $ident be undefined and cause the Perl interpreter to complain (if you are using strict and warnings).

Here's an example of the run_on_finish method:

$manager->run_on_finish( sub { my ( $pid, $exit_code, $ident, $signal, $core ) = @_; if ( $core ) { print "Process $ident (pid: $pid) core dumped.\n"; } else { print "Process $ident (pid: $pid) exited print "with code $exit_code and signal $signal.\n"; } } );

This callback prints useful messages upon completion of the process. One caveat is that $ident must be defined in the start method of each process for this to work, otherwise this code needs to be modified.

The run_on_wait subroutine is a bit different. It is called when the Parallel::ForkManager object needs to wait for something, such as waiting for startup, starting, and waiting for processes to exit. It takes both a subroutine (or subroutine reference) and a optional argument $period, which defines the number of seconds to wait before calling the method again. Here's an example of it's usage:

$manager->wait_on_finish( sub { print "Waiting ... \n"; }, 3 );

This example prints its message about every 3 seconds. In the notes for the latest version of Parallel::ForkManager, it says that the exact period of time is not guaranteed and can vary slightly according to system load. If the second argument is not provided, then the subroutine will be called after the appropriate wait during the start and wait_on_children methods.

Bugs and Limitations

These are straight from the Parallel::ForkManager perldoc, three caveats are provided:
  • "Do not use Parallel::ForkManager in an environment, where other child processes can affect the run of the main program, so using this module is not recommended in an environment where fork() / wait() is already used."
  • "If you want to use more than one copies of the Parallel::ForkManager, then you have to make sure that all children processes are terminated, before you use the second object in the main program."
  • "You are free to use a new copy of Parallel::ForkManager in the child processes, although I don't think it makes sense."

Other Resources

One of the most valuable sources of information on this module is the Perldoc formatted, documentation is available on systems that have Parallel::ForkManager installed and from CPAN.

In reply to Introduction to Parallel::ForkManager by biosysadmin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (8)
    As of 2014-09-24 05:56 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (246 votes), past polls