http://www.perlmonks.org?node_id=405950


in reply to Parallel::ForkManager, DBI using memory

Based on your conversation with The Mad Hatter, it looks like you'll need a separate $dbh for each child process. Also, I can't imagine that each of your child processes actually needs its own database handle.

I'm not sure of the exact mechanism for memory management with forking processes (it may even vary across operating system), but I'm guessing that every time you use Parallel::ForkManager to fork off another process you're needing to make another copy of the program's namespace. If this is the case, then your program would be using this much memory regardless of the DBI's implementation for generating DBH handles. For a simple test of this, comment out all of the lines dealing with database handles in your code and see if your memory grows in the same way.

If this were my problem, I'd try to balance the speedup of using multiple Parallel::ForkManager processes along with the need to keep the number of database connections low. Why not divide your @links array into N parts, each of which is processed by a separate Parallel::ForkManager process? I've tried that approach with other programs using Parallel::ForkManager, and it's worked very well.

If this is a serious application, then you might even do some benchmarking to determine the limiting factor in your processing. If your process is CPU-limited and your box has multiple CPUs, then set N equal to the number of CPUs on your box. The name @links suggests to me that it might be network-limited, in which case having N being medium-sized (10-50 in my mind) might be a good idea. Only benchmarking will tell the whole story. Best of luck with your problem. :)