Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Parallel::ForkManager, DBI using memory

by 2ge (Scribe)
on Nov 07, 2004 at 18:33 UTC ( [id://405911]=perlquestion: print w/replies, xml ) Need Help??

2ge has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!

I am playing with Parallel::ForkManager, but I have one interesting question. When I run script bellow it uses too much memory - it seems to me, that child thread doesn't free all memory it used. On 100th iteration it uses around 18MB, after 200th it uses around 29MB! Could anyone explain me why is that, and where is error in my script ? If I don't use DBI stuff in child threads everything is ok. Is there some dbh destructor, which I'm missing ? I use activestate perl 5.8.4 and latest ForkManager
# this is just example script use strict; use warnings; use Parallel::ForkManager; use DBI; my @links=(1..1000); my $pm=new Parallel::ForkManager(5); foreach my $x (0..$#links) { $pm->start and next; my $dbh = DBI->connect("DBI:mysql:database=customers;host=loca +lhost;port=3306", "root", "") or die "Can't connect: ", $DBI::errstr; $dbh->{RaiseError} = 1; print "$links[$x]\n"; # do something (read, update, insert from db) $dbh->disconnect; $pm->finish; }

Thanks for any help -- Brano

Replies are listed 'Best First'.
Re: Parallel::ForkManager, DBI using memory
by biosysadmin (Deacon) on Nov 07, 2004 at 22:27 UTC
    Based on your conversation with The Mad Hatter, it looks like you'll need a separate $dbh for each child process. Also, I can't imagine that each of your child processes actually needs its own database handle.

    I'm not sure of the exact mechanism for memory management with forking processes (it may even vary across operating system), but I'm guessing that every time you use Parallel::ForkManager to fork off another process you're needing to make another copy of the program's namespace. If this is the case, then your program would be using this much memory regardless of the DBI's implementation for generating DBH handles. For a simple test of this, comment out all of the lines dealing with database handles in your code and see if your memory grows in the same way.

    If this were my problem, I'd try to balance the speedup of using multiple Parallel::ForkManager processes along with the need to keep the number of database connections low. Why not divide your @links array into N parts, each of which is processed by a separate Parallel::ForkManager process? I've tried that approach with other programs using Parallel::ForkManager, and it's worked very well.

    If this is a serious application, then you might even do some benchmarking to determine the limiting factor in your processing. If your process is CPU-limited and your box has multiple CPUs, then set N equal to the number of CPUs on your box. The name @links suggests to me that it might be network-limited, in which case having N being medium-sized (10-50 in my mind) might be a good idea. Only benchmarking will tell the whole story. Best of luck with your problem. :)

      Thanks for answer BioSysadmin,

      I read some documentation about this, and yes - every child process needs own db handler. Next - ofcourse, I tried commenting out DBI stuff, it works good, so it is bug in DBI, some memory leak, or what ? I don't believe that.

      And thats not point, if I run 5 processes at once, or 50, or just 2. Always, when my thread ends it takes some memory, so at the finish of script perl process will consume equal memory. I hope someone will give me answer to this interesting question.

      Brano
        Interesting. Another thing you might try is manually undef'ing your database handles at the end of your loop, like this:
        foreach my $x (0..$#links) { $pm->start and next; my $dbh = DBI->connect("DBI:mysql:database=customers;host=loca +lhost;port=3306", "root", "") or die "Can't connect: ", $DBI::errstr; $dbh->{RaiseError} = 1; print "$links[$x]\n"; # do something (read, update, insert from db) $dbh->disconnect; undef($dbh); $pm->finish; }
        Best of luck. :)
Re: Parallel::ForkManager, DBI using memory
by The Mad Hatter (Priest) on Nov 07, 2004 at 19:44 UTC
    It might be a DBI memory management issue, but I'd suspect right off the bat the DB connections you're constantly making and destroying. Is there a reason you can't share one $dbh for all children? (I think this is possible with Parallel::ForkManager.)
      Hi The Mad Hatter,

      ofcourse I tried that before I post my question. If you try, you get similar error:

      DBD::mysql::db prepare_cached failed: handle 2 is owned by thread 274094 not current thread 6fad024 (handles can't be shared between threads and your driver may need a CLONE method added) at...

      any suggestions ?
        Ah, okay. I had an inkling DBI might not like it.
      Is there a reason you can't share one $dbh for all children? (I think this is possible with Parallel::ForkManager.)

      No, it is not possible.

      Reason: Think of two childs, both want to make a query:

      # Child 1:
      SELECT * FROM foobar WHERE baz = 3

      # Child 2:
      SELECT grzbaka FROM baz WHERE foo = 5

      Now, both send their queries at the same time. The db server might receive:

      SELECT * FROSELECT grzbaka FROM baM foobar WHERE baz = 3z WHERE foo = 5

      Of course, reality is more complex, this example is just meant as a simple explanation why one db connection used by many childs don't work (as expected).

      BTW, if you do cooperative multitasking, like POE does, everything is ok -- there is only one process which sends data to the db server.

Re: Parallel::ForkManager, DBI using memory
by perrin (Chancellor) on Nov 08, 2004 at 04:20 UTC
    How many processes are running in parallel at once? 200 Perl interpreters with DBI loaded in each one could easilly take that much space.
      read documentation for ForkManager, it run just 5 parallel threads at once, it is written in here:

      my $pm=new Parallel::ForkManager(5);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://405911]
Approved by Corion
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-03-19 06:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found