perltutorial
biosysadmin
<h2>Introduction to Parallel::ForkManager</h2>
<h3>Introduction</h3>
The goal of this tutorial is to demonstrate how to use Parallel::ForkManager,
a simple and powerful Perl module available from <a href="http://cpan.org">
CPAN</a>.
Parallel::ForkManager is a simple and powerful module that can be used to perform a series of operations in parallel within a single Perl script. It is especially well-suited to performing a number of repetitive operations on a relatively powerful machine, especially when working on a multiprocessor machine. This module uses object-oriented syntax, if that frightens you then
should read some of the <a
href="http://perlmonks.org/index.pl?node=Tutorials#objorient">Object
Oriented Perl</a> tutorials.
</p>
<h3>Usage</h3>
<p>
One caveat to using Parallel::ForkManager is that you must instantiate
the Parallel::ForkManager object with a number representing the maximum
number of processes to fork. Here is an example of the syntax:
</p>
<code>
my $manager = new Parallel::ForkManager( 20 );
</code>
<p>
In many cases, this maximum number of processes to fork will also be the actual number of processes forked by your program. In this case, it is very important to choose this number carefully, as forking a large enough number of processes is enough to bring even the mightiest of machines to it's knees. Also, you can change this number later in your program as needed with the following method:
</p>
<code>
$manager->set_max_procs( $newMaximumProcs );
</code>
<p>
After instantiating a Parallel::ForkManager object, you can start forking
processes using the start method. It is important to also define the point
at which the child processes will finish. This is usually performed within a
for or while loop, so the syntax will look like this:
</p>
<code>
foreach my $command (@commands) {
$manager->start and next;
system( $command );
$manager->finish;
};
</code>
<p>
The line within the for loop is a common idiom used for
Parallel::ForkManager, it starts running the command via a forked process
and advances to the next command in the @command array. The start method
takes an optional parameter named $process_identifier, which can be used
in callbacks (see Callbacks section).
</p>
<p>
Another useful method in the Parallel::ForkManager class is the
wait_all_children method. It performs a blocking wait on the parent program
that waits until all forked processes have finished.
</p>
<h3>Callbacks</h3>
<p>
It is possible to define callbacks to child processes, which are blocks of code that are called at various points of the execution of your processes. There are three forms of callbacks:
<ul>
<li>run_on_start - run when each process is started</li>
<li>run_on_finish - run when each process is finished</li>
<li>run_on_wait - run when a process needs to wait for startup</li>
</ul>
Callbacks are defined using the run_on_start, run_on_finish, and run_on_wait
methods, which take subroutines (or references to subroutines) as arguments.
The arguments provided to the subroutine differ depending on which form of
callback you are defining.
</p>
<p>
Here's an example of the run_on_start method:
</p>
<code>
$manager->run_on_start(
sub {
my ($pid,$ident) = @_;
print "Starting processes $ident under process id $pid\n";
}
);
</code>
<p>
The arguments passed to the run_on_start sub are the process id of the
forked process (provided by the operating system), and an identifier for the
process that can be defined in the start method of the Parallel::ForkManager
process. You should remember this in case that you don't provide an
identifier in the call to start, this will make $ident be undefined and
cause the Perl interpreter to complain (if you are <a
href="http://perlmonks.org/index.pl?node_id=111088"> using strict and
warnings)</a>.
</p>
<p>
Here's an example of the run_on_finish method:
</p>
<code>
$manager->run_on_finish(
sub {
my ( $pid, $exit_code, $ident, $signal, $core ) = @_;
if ( $core ) {
print "Process $ident (pid: $pid) core dumped.\n";
} else {
print "Process $ident (pid: $pid) exited
print "with code $exit_code and signal $signal.\n";
}
}
);
</code>
<p>
This callback prints useful messages upon completion of the process. One
caveat is that $ident must be defined in the start method of each process
for this to work, otherwise this code needs to be modified.
</p>
<p>
The run_on_wait subroutine is a bit different. It is called when the
Parallel::ForkManager object needs to wait for something, such as waiting
for startup, starting, and waiting for processes to exit. It takes both a
subroutine (or subroutine reference) and a optional argument $period, which
defines the number of seconds to wait before calling the method again.
Here's an example of it's usage:
</p>
<code>
$manager->wait_on_finish(
sub {
print "Waiting ... \n";
},
3
);
</code>
<p>
This example prints its message about every 3 seconds. In the notes for
the <a
href="http://search.cpan.org/author/DLUX/Parallel-ForkManager-0.7.5/ForkManager.pm">latest version of Parallel::ForkManager</a>, it says that the exact
period of time is not guaranteed and can vary slightly according to system
load. If the second argument is not provided, then the subroutine will be
called after the appropriate wait during the start and wait_on_children
methods.
</p>
<h3>Bugs and Limitations</h3>
These are straight from the Parallel::ForkManager perldoc, three caveats are
provided:
<ul>
<li>"Do not use Parallel::ForkManager in an environment, where other child
processes can affect the run of the main program, so using this module is
not recommended in an environment where fork() / wait() is already
used."</li>
<li>"If you want to use more than one copies of the Parallel::ForkManager,
then you have to make sure that all children processes are terminated,
before you use the second object in the main program."</li>
<li>"You are free to use a new copy of Parallel::ForkManager in the child
processes, although I don't think it makes sense."</li>
</ul>
<h3>Other Resources</h3>
One of the most valuable sources of information on this module is the Perldoc
formatted, documentation is available on systems that have Parallel::ForkManager
installed and <a
href="http://search.cpan.org/author/DLUX/Parallel-ForkManager-0.7.5/ForkManager.pm">from
CPAN</a>.