Come for the quick hacks, stay for the epiphanies. | |
PerlMonks |
Converting a parallel-serial shell scriptby Corion (Patriarch) |
on Sep 18, 2008 at 11:20 UTC ( [id://712235]=perlquestion: print w/replies, xml ) | Need Help?? |
Corion has asked for the wisdom of the Perl Monks concerning the following question: Dear monks, I have a process pipeline that consists of the following steps:
I have a machine that has 4 CPUs and hence I want to maximize CPU utilization, as the data takes some while to get converted. Step 1 can be easily parallelized, but step 2 must be serialized as the database bulk loader fails if another bulk loading process is currently loading into the same table. I also want to keep the disk space usage low, so I want to import the data filie as soon as it is written to disk instead of importing all data after converting it. So I need a semaphore on step 2. I have written a set of shell scripts that nicely do this, using the runN utility by Dominus for convenient parallelization:
There are lots of ugly parts to this shell script, but it works. The ugly parts are:
What I'd like is an easy way to write the whole parallelize-then-serialize stuff in Perl. Simple forking doesn't work because I need to pass the data "back up" to the parent process or downwards in a serial fashion so that only one DB import runs at a time, preferrably still without blocking the overall progress, so that all 4 CPUs keep running. Also, of course it would be much nicer to pass around Perl data structures instead of having to manually make sure that the number of columns in the converter script is identical to the number of columns expected by the importer script. I envision as an imaginary API something like the following:
In practice, the next step would be to eliminate the wrapping shell scripts and to replace them by the real Perl code. Has anybody done something like this? Is there anything that shields me from serializing the data and then deserializing it like I'd have to do with Parallel::ForkManager? Update: Just after posting (not after previewing) this, I realize that this would be a prime application for threads, at least under Windows. The target machine runs HP-UX, but at least it's an ActiveState build so threads should be available there too. Is writing a smallish wrapper around threads and Thread::Queue the way to go then?
Back to
Seekers of Perl Wisdom
|
|