in reply to
Parallel processing with ForkManager
I tend to agree that it probably would be stoppered-up by the capacity of the database server. And let’s face it ... neither a 100MB text-file nor a 5GB database is, by today’s standards, that large. Maybe you could make some read-only copies of the database at various places. Maybe you could optimize the search process in the database in some useful way. In general, I just think that trying to cluster this thing is going to be a lot of trouble, for doubtful benefit.
Clustering works really well when the workload is primarily CPU-bound and when there are no resource-contentions. Here, both of these are not-the-case.
Edit: BrowserUK’s subsequent recommendation to use temporary tables and a join-query, below, is in my view unquestionably the best approach to take in this case. Now, nothing but the bulk move-in and the bulk move-out is “happening over the wire.” The computer gets the essential job done in one step, and strictly within its own optimized world.