My First Submission to CPAN (Parallel::ForkControl)

I've been posting here at perlmonks for a while. Even though I occassionally disappear for weeks or months at a time, I consider this community to be one of the best Perl resources available. As such, I'm looking for feedback on my first submission to CPAN. I'm hoping that it helps people, because that's why I released it. 0.01 is the "stable" version I've used in several production environments for over a year now.

I've labeled this release 0.01 because this is just the start of the functionality I'd like to add to this module. For now, it's only been tested on linux and freebsd. I'd like to utilize some Operating System specific logic, IE creating a Parallel::ForkControl::Linux/FreeBSD/Solaris etc and encapsulating those in either a Factory model, or just have them install as Parall::ForkControl at compile time.

I've been asked a million times why I didn't just contribute to Parallel::ForkManager. The main reason is that I don't want the programmer responsible for having to know the structure of using a fork(). I've struggled through placing exit()'s at the appropriate place, handling the reaping of children, figuring out 'where am I the child/parent process' and don't want people who use this module to struggle with those concepts. Often times, its difficult for someone who's beginning programming and has no unix background to figure these things out. I fork bombed my servers SEVERAL times and its not fun. Here's a sample using my module:

use strict;
use Parallel::ForkControl;

my $forker = new Parallel::ForkControl(
               Name       => 'Test Program',
               MaxKids    => 20,
               MinKids    => 1,   # default btw
               WatchCount => 1,   # keep track by number
               WatchLoad  => 1,   # use '/bin/uptime'
               MaxLoad    => 8.0,
               Code       => \&getStats  # child subroutine
);

my @hosts = map { $_ = qq/host$_/; } (1..400);

foreach my $host (@hosts) {
      $forker->run($host);
}
$forker->cleanup();

sub getStats {
       # code me as a regular subroutine with return
       # at the end, or don't, it doesn't matter
       # anything happening in here will be the child
       my $host = shift;
       ........
       return 1;
}
[download]

Basically, the goal here is to eliminate the ability of fork bombing the server while allowing people to spend more time developing the actual processing of the code. Right now there's support for ProcessTimeout which allows you to automagically reap children who've gone too long without terminating (default is 120 seconds) to safeguard against things taking forever and a day to wrap up. Additionally, the children will die if the parent process no longer exists.

I started the module because I wanted to be able to process large lists of routers as quickly as possible. I kept fooling around with what is now the MaxKids arg but changes in the network and the machine from nite to nite would either slow everything down to a crawl and affect other cronjobs running on the server, or I wouldn't use the server to its fullest potential. I wanted to be able to use my server to its best abilities every nite, whatever the phase of the moon allowed. That's where this module really began to diverge from Parallel::ForkManager and take on a life of its own, for instance:


my $forker = new Parallel::ForkControl(
               Name => 'Dynamic, Load Based Solution'
               WatchCount => 0,
               WatchLoad  => 1,
               MaxLoad    => 5.00,
               Code       => \&getStats
);
[download]

Which basically tells the module to run as many children as possible until the one minute average load hits 5.00, then wait until it drops below 5.00, and continue forking. You can even tell it to no matter what, keep a certain MinKids fork()ing.

Anyways, its something that saved me many keystrokes, headaches, and overheating servers, so I thought I'd share. Future enhancements will include Memory/CPU based throttling, as well as finer grained control of processes, including killing active children to lower a load, caching the args, and restarting them once the load/mem/cpu reach acceptable levels again. Ideally, I'd like to use suspend. But, planning for platforms that don't support it, I'd add a RollBack callback method that would take the same args as the Code block, and would leave the ability to rollback changes made in the Code block to the programmer if they desired this level of process control.

btw, 0.02 is going to be solely a documentation and testing upgrade. I distributed with very basic documentation and would like to provide ALOT more detail in the pod as to make this a VERY usable module. Testing also helps a great deal, again, its only got very basic tests included.

Any thoughts/suggestions/complaints?

-brad..

Back to Meditations