Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Parallel Fork Manager -- Can't kill zombies!

by expresspotato (Beadle)
on Oct 16, 2009 at 04:54 UTC ( #801497=perlquestion: print w/ replies, xml ) Need Help??
expresspotato has asked for the wisdom of the Perl Monks concerning the following question:

Hello, ParallelForkManager seems to be causing me some serious headaches. Consider the following example: The following block is supposed to be run when the child process finishes.
$pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; print "** $ident just got out of the pool ". "with PID $pid and exit code: $exit_code\n"; } );
But it never does. Only when wait_all_children is included does the on_finish get run. We start a processor thread, because other threads are running as well (need not mention them here).
$thr4 = threads->new(\&processor); $thr4->detach; while (1){ sleep(1); #$pm->wait_all_children; }
sub processor{ my @row; while (1){ if ($sys_ok){ @row = &row_sql("select * from pool_process where srv='$se +rver_id';"); if (!(@row)){ #print "Nothing to do...\n"; }else{ &del_sql("delete from pool_process where id='@row[0]'; +"); print "Processing (@row[1],@row[2],@row[3],@row[4])\n" +; #my $thr = threads->create(\&process_start,@row[1],@ro +w[2],@row[3],@row[4],@row[5]); $child_processor++; my $pid = $pm->start($child_processor) and next; &process_start(@row[1],@row[2],@row[3],@row[4],@row[5]); print "Child about to get out ($$)\n"; system("kill $$"); $pm->finish($child_processor); # Terminates the chil +d process print "Finished Child!"; } sleep(2); }else{ sleep(30); } #$pm->wait_all_children; } }
Because $pm->wait_all_children is blocking, this prevents items to be processed from being added from an SQL Table, hence the while loop to look for new items. Without the blocking wait_all_children, things function as they should. However the system is left with TONS of zombie processes because finish never gets called. My only guess is it is because of the  while (1){sleep 1;} But that is required to leave the detached threads still running. The  system("kill $$"); is an attempt by me to get the zombie process ending. No dice. Even kill -9 (a supposed work around) does nothing. Any and all help is really appreciated.

Comment on Parallel Fork Manager -- Can't kill zombies!
Select or Download Code
Re: Parallel Fork Manager -- Can't kill zombies!
by BioLion (Curate) on Oct 16, 2009 at 09:35 UTC

    I don't know much about use threads; but I do know that mixing threads and forks is a bad idea ( perlthrtut ). Maybe this is at the root of the problem? It might be better to think things over and use something which gives you better process control ( maybe Parallel::Forker? ) rather than using dirty tricks to make Parallel::ForkManager do what you want?

    Just a something something...
Re: Parallel Fork Manager -- Can't kill zombies!
by jakobi (Pilgrim) on Oct 16, 2009 at 11:02 UTC

    This is also an incomplete suggestion and I'm looking forward about the replies from others. Esp. given the recent daemon/sysread thread where some framework for forking/daemon-use would be also have been useful for the larger picture (avoiding zombies and race conditions by not reinventing an (*) overly complex and fragile process management...).

    Ignoring threads and the module for now:

    A true zombie is fairly lightweight and consists of not much more than kernel data containing the process exist status, which still needs to be returned to the invoking process or init. Check the man pages of perlvar (%SIG,SIGCHLD) and perlipc for details. You have zombies when you

    • do not handle SIGCHLD with a signal handler exor
    • do not wait for exiting children with wait or waitpid. If you have the full list of processes, you could use a non-blocking waitpid call in the parent to check/reap zombies (be very paranoid to avoid fragile code and races, and be prepared to allow for impossible bugs like this one: job-management vs. waitpid in the Perl Shell - which I still don't understand, despite my work-around).

    Myself, I'd probably try the scrap from perldoc -f waitpid and add it to the parent process in a regularly visited code path.

    Or better yet, grep docs and source of the fork manager module for interesting methods using wait/waitpid to avoid yet another run-in with races -> (*).

    HTH & looking forward to more module-specific replies,
    Peter

Re: Parallel Fork Manager -- Can't kill zombies!
by fbicknel (Sexton) on Feb 09, 2010 at 16:32 UTC

    I think the source of this problem was correctly identified (i.e. only the ->start method waits for children &ENOHANG), but I didn't see a solution.

    Here's a suggestion. It uses a method that isn't documented called ->wait_one_child. If you call it with ->wait_one_child (&ENOHANG), it behaves just like waitpid(), but also calls your run_on_finish callback.

    I'm guessing that calling this method from outside the module wasn't in the design, but it seems to work ok for me. I, too, had an application where the parent was waiting for queue work, so didn't use ->start on a periodic basis. With this, you can either call ->wait_one_child periodically to reap children and run their callbacks, or you can design in some other event to do the same thing.

    Call it in a loop (again, just like waitpid()) to get all the zombies reaped. See the docs for waitpid() in the Perl docs for sample code.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://801497]
Approved by wfsp
Front-paged by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2014-07-11 08:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (221 votes), past polls