|Welcome to the Monastery|
Design advice: Classic boss/worker program memory consumptionby shadrack (Acolyte)
|on May 20, 2014 at 17:50 UTC||Need Help??|
shadrack has asked for the
wisdom of the Perl Monks concerning the following question:
I've got a boss/worker design issue that I'd like to get some suggestions on. I have a script that I've built up over the years, now over 6300 lines (yes, it's an unholy beast). The program's job involves assembling a big hash (%dataset), each member of which is a "work unit" of a few dozen KB. %dataset can contain anywhere from 0 (effectively a no-op) up to about 10,000 work units. Work units are farmed out in queue fashion to a set of child worker processes. Each worker processes one work unit at a time (altering it) and returns it to the boss. Although not directly relevant to the problem, I'll go ahead and mention that the processing of a work unit is I/O bound and typically takes less than a minute (though in extreme cases it can take up to 10). I'd estimate the program runs about 100 times a day, often with a dataset consisting of a single work unit.
The boss process currently uses the following algorithm (yes, this is pseudo-code):
$max_children was initially 50, but in recent months, I've had to reduce it to 30 as a workaround for the problem I've started experiencing. This problem is memory -- or rather, a lack of memory caused by an increase in the number of work units. Because %dataset is assembled before forking off all the worker processes, they each have a complete copy of the entire dataset. As the boss collects processed work units it updates %dataset accordingly which triggers copy-on-write for the part of %dataset that's being updated.
The obvious solution is to create all the worker processes BEFORE assembling %dataset, however it's only after %dataset is assembled that the boss knows how many worker processes it needs, so I have sort of a chicken-and-egg problem.
So far, I've come up with two solutions, neither of which gives me the warm fuzzies.
Does anybody see a solution that's better than either of the above in terms of SIMPLICITY and EFFICIENCY -- preferably one that doesn't require a major overhaul?