Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

My First Submission to CPAN (Parallel::ForkControl)

by reyjrar (Hermit)
on Dec 15, 2003 at 23:26 UTC ( [id://314931]=perlmeditation: print w/replies, xml ) Need Help??

I've been posting here at perlmonks for a while. Even though I occassionally disappear for weeks or months at a time, I consider this community to be one of the best Perl resources available. As such, I'm looking for feedback on my first submission to CPAN. I'm hoping that it helps people, because that's why I released it. 0.01 is the "stable" version I've used in several production environments for over a year now.
I've labeled this release 0.01 because this is just the start of the functionality I'd like to add to this module. For now, it's only been tested on linux and freebsd. I'd like to utilize some Operating System specific logic, IE creating a Parallel::ForkControl::Linux/FreeBSD/Solaris etc and encapsulating those in either a Factory model, or just have them install as Parall::ForkControl at compile time.

I've been asked a million times why I didn't just contribute to Parallel::ForkManager. The main reason is that I don't want the programmer responsible for having to know the structure of using a fork(). I've struggled through placing exit()'s at the appropriate place, handling the reaping of children, figuring out 'where am I the child/parent process' and don't want people who use this module to struggle with those concepts. Often times, its difficult for someone who's beginning programming and has no unix background to figure these things out. I fork bombed my servers SEVERAL times and its not fun. Here's a sample using my module:
use strict; use Parallel::ForkControl; my $forker = new Parallel::ForkControl( Name => 'Test Program', MaxKids => 20, MinKids => 1, # default btw WatchCount => 1, # keep track by number WatchLoad => 1, # use '/bin/uptime' MaxLoad => 8.0, Code => \&getStats # child subroutine ); my @hosts = map { $_ = qq/host$_/; } (1..400); foreach my $host (@hosts) { $forker->run($host); } $forker->cleanup(); sub getStats { # code me as a regular subroutine with return # at the end, or don't, it doesn't matter # anything happening in here will be the child my $host = shift; ........ return 1; }
Basically, the goal here is to eliminate the ability of fork bombing the server while allowing people to spend more time developing the actual processing of the code. Right now there's support for ProcessTimeout which allows you to automagically reap children who've gone too long without terminating (default is 120 seconds) to safeguard against things taking forever and a day to wrap up. Additionally, the children will die if the parent process no longer exists.

I started the module because I wanted to be able to process large lists of routers as quickly as possible. I kept fooling around with what is now the MaxKids arg but changes in the network and the machine from nite to nite would either slow everything down to a crawl and affect other cronjobs running on the server, or I wouldn't use the server to its fullest potential. I wanted to be able to use my server to its best abilities every nite, whatever the phase of the moon allowed. That's where this module really began to diverge from Parallel::ForkManager and take on a life of its own, for instance:
my $forker = new Parallel::ForkControl( Name => 'Dynamic, Load Based Solution' WatchCount => 0, WatchLoad => 1, MaxLoad => 5.00, Code => \&getStats );
Which basically tells the module to run as many children as possible until the one minute average load hits 5.00, then wait until it drops below 5.00, and continue forking. You can even tell it to no matter what, keep a certain MinKids fork()ing.

Anyways, its something that saved me many keystrokes, headaches, and overheating servers, so I thought I'd share. Future enhancements will include Memory/CPU based throttling, as well as finer grained control of processes, including killing active children to lower a load, caching the args, and restarting them once the load/mem/cpu reach acceptable levels again. Ideally, I'd like to use suspend. But, planning for platforms that don't support it, I'd add a RollBack callback method that would take the same args as the Code block, and would leave the ability to rollback changes made in the Code block to the programmer if they desired this level of process control.

btw, 0.02 is going to be solely a documentation and testing upgrade. I distributed with very basic documentation and would like to provide ALOT more detail in the pod as to make this a VERY usable module. Testing also helps a great deal, again, its only got very basic tests included.

Any thoughts/suggestions/complaints?
-brad..

Replies are listed 'Best First'.
Re: My First Submission to CPAN (Parallel::ForkControl)
by Zaxo (Archbishop) on Dec 15, 2003 at 23:46 UTC
    ... tells the module to run as many children as possible until the one minute average load hits 5.00

    Yow! Have you tested that a bunch? You can fork a lot of children in less than a minute.

    After Compline,
    Zaxo

      Yes sir I have! There's a _check() routine in the script which runs by default after every 50th child to check to see if all the kids it thinks are alive, are in fact still alive, in case one manages to evade the reaper! That process as well as the overhead of checking the environment for safety seem to delay the process just enough to not fork() a billion processes at once. Of course, I could see that this might pose a problem down the road on some systems. I'm definately going to address this in a more permanent way by removing the current call to '/usr/bin/uptime' with something more reliable and more understanding. Granted, there will be a certain over head associated with determining the current load/mem/cpu usage before every fork() call, but the safeguards it'll provide should more than pay off. Additionally, I'll probably provide a mechanism to forego the safety net, because sometimes, I want enough rope to hang myself.

      Its difficult to really provide a decent temporary solution for this problem as my children might only use 1% mem and 5% CPU, where as someone elses might use 54% mem and 70% CPU during processing. I suppose, it might help to profile the children as well. If we're processing on a big list, we might only fork() 5-10 processes for the first 60 seconds, and gather information on their peak performance, and be able to make an educated guess as to the resources these children will use on the system, and be able to dynamically adjust the number of concurrent processes based on that data.

      I have a ton of ideas to make this the easiest to use, and most flexible process controller for Perl. Granted, poor coding in the children will almost always blow up in your face, I'm just aiming to make it harder to do so.

      -brad..
Re: My First Submission to CPAN (Parallel::ForkControl)
by etcshadow (Priest) on Dec 15, 2003 at 23:57 UTC
    Well, one thing (not to sound like a jerk), but you're gonna get reamed if you don't change this line in your example:
    my @hosts = map { $_ = qq/host$_/; } (1..400);
    Since you're modifying something in a map, which many people will gripe about on style points... but it's also useless and misleading to boot. Try just:
    my @hosts = map { qq/host$_/ } (1..400);
    Instead. Yeah, I know it's a minor nit-picky point, but, IMHO, when you put something out for the community, you want it to look polished.

    Anyway, good luck!

    ------------ :Wq Not an editor command: Wq
Re: My First Submission to CPAN (Parallel::ForkControl)
by staunch (Pilgrim) on Dec 16, 2003 at 15:26 UTC
    Running /bin/uptime may be a more portable way of determining load average.
    But in case you hadn't seen it: Under Linux there is a proc pseudo file /proc/loadavg that you can read for this purpose.


    man 5 proc
    loadavg
    The load average numbers give the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1, 5, and 15 minutes. They are the same as the load average numbers given by uptime(1) and other programs.


    Staunch

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://314931]
Approved by Zaxo
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-03-19 10:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found