comment on

Monks -

I've got a boss/worker design issue that I'd like to get some suggestions on. I have a script that I've built up over the years, now over 6300 lines (yes, it's an unholy beast). The program's job involves assembling a big hash (%dataset), each member of which is a "work unit" of a few dozen KB. %dataset can contain anywhere from 0 (effectively a no-op) up to about 10,000 work units. Work units are farmed out in queue fashion to a set of child worker processes. Each worker processes one work unit at a time (altering it) and returns it to the boss. Although not directly relevant to the problem, I'll go ahead and mention that the processing of a work unit is I/O bound and typically takes less than a minute (though in extreme cases it can take up to 10). I'd estimate the program runs about 100 times a day, often with a dataset consisting of a single work unit.

The boss process currently uses the following algorithm (yes, this is pseudo-code):

$max_children=30
Create IPC socket
$SIG{CHLD}=\&reaper
Assemble %dataset
$num_children_required = minimum($max_children,num_work_units)
foreach $num_children (1..$num_children_required) {
    fork() a worker
}
while($num_children > 0) {
    listen for children on socket
    get from child on socket: processed work unit (if any)
    if(num_unprocessed_work_units > 0) {
        tell child on socket: process work unit identified by $key
    } else {
        tell child on socket: quit
    }
}

sub reaper {
    $num_children--;
}
[download]

$max_children was initially 50, but in recent months, I've had to reduce it to 30 as a workaround for the problem I've started experiencing. This problem is memory -- or rather, a lack of memory caused by an increase in the number of work units. Because %dataset is assembled before forking off all the worker processes, they each have a complete copy of the entire dataset. As the boss collects processed work units it updates %dataset accordingly which triggers copy-on-write for the part of %dataset that's being updated.

The obvious solution is to create all the worker processes BEFORE assembling %dataset, however it's only after %dataset is assembled that the boss knows how many worker processes it needs, so I have sort of a chicken-and-egg problem.

So far, I've come up with two solutions, neither of which gives me the warm fuzzies.

SOLUTION 1:: Instead of forking all the workers directly, the boss forks off a single child BEFORE assembling %dataset (we'll call this child the SPAWNER). Then, after assembling %dataset and determining how many workers are required, the boss tells SPAWNER to create that many workers. Communication between the boss and workers is more or less the same as above, however the reaping of children must be handled by SPAWNER which must convey each reap to the boss.
PROBLEM: The addition of an extra process between the boss and worker processes adds COMPLEXITY in a number of areas: the spawning of workers, the reaping of workers, the requirement for the boss to detect and recover if SPAWNER terminates prematurely, and possibly other areas I haven't thought of.
SOLUTION 2:: Fork $max_children number of workers before assembling %dataset, then, after assembling %dataset and determining how many workers are required, kill off any unneeded workers (if any).
PROBLEM: Creates processes we don't need, which is INEFFICIENT.

Does anybody see a solution that's better than either of the above in terms of SIMPLICITY and EFFICIENCY -- preferably one that doesn't require a major overhaul?

In reply to Design advice: Classic boss/worker program memory consumption by shadrack

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


P is for Practical
	PerlMonks