Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Parallel::ForkManager, DBI using memory

by biosysadmin (Deacon)
on Nov 07, 2004 at 22:27 UTC ( #405950=note: print w/replies, xml ) Need Help??

in reply to Parallel::ForkManager, DBI using memory

Based on your conversation with The Mad Hatter, it looks like you'll need a separate $dbh for each child process. Also, I can't imagine that each of your child processes actually needs its own database handle.

I'm not sure of the exact mechanism for memory management with forking processes (it may even vary across operating system), but I'm guessing that every time you use Parallel::ForkManager to fork off another process you're needing to make another copy of the program's namespace. If this is the case, then your program would be using this much memory regardless of the DBI's implementation for generating DBH handles. For a simple test of this, comment out all of the lines dealing with database handles in your code and see if your memory grows in the same way.

If this were my problem, I'd try to balance the speedup of using multiple Parallel::ForkManager processes along with the need to keep the number of database connections low. Why not divide your @links array into N parts, each of which is processed by a separate Parallel::ForkManager process? I've tried that approach with other programs using Parallel::ForkManager, and it's worked very well.

If this is a serious application, then you might even do some benchmarking to determine the limiting factor in your processing. If your process is CPU-limited and your box has multiple CPUs, then set N equal to the number of CPUs on your box. The name @links suggests to me that it might be network-limited, in which case having N being medium-sized (10-50 in my mind) might be a good idea. Only benchmarking will tell the whole story. Best of luck with your problem. :)

Replies are listed 'Best First'.
Re^2: Parallel::ForkManager, DBI using memory
by 2ge (Scribe) on Nov 08, 2004 at 13:36 UTC
    Thanks for answer BioSysadmin,

    I read some documentation about this, and yes - every child process needs own db handler. Next - ofcourse, I tried commenting out DBI stuff, it works good, so it is bug in DBI, some memory leak, or what ? I don't believe that.

    And thats not point, if I run 5 processes at once, or 50, or just 2. Always, when my thread ends it takes some memory, so at the finish of script perl process will consume equal memory. I hope someone will give me answer to this interesting question.

      Interesting. Another thing you might try is manually undef'ing your database handles at the end of your loop, like this:
      foreach my $x (0..$#links) { $pm->start and next; my $dbh = DBI->connect("DBI:mysql:database=customers;host=loca +lhost;port=3306", "root", "") or die "Can't connect: ", $DBI::errstr; $dbh->{RaiseError} = 1; print "$links[$x]\n"; # do something (read, update, insert from db) $dbh->disconnect; undef($dbh); $pm->finish; }
      Best of luck. :)
        Hi biosys!

        thanks for next suggestion, I tryied that, unfortunately it doesn't help, also you have one little error in your posted script, undef($dbh) should be before $pm->finish. Any more ideas ? :)
        I really don't know how to solve this, I have around 15.000+ iterations, so I will always run out of memory by using this :(


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://405950]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2020-05-27 10:28 GMT
Find Nodes?
    Voting Booth?
    If programming languages were movie genres, Perl would be:

    Results (154 votes). Check out past polls.