Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Monitoring Child Process

by Anonymous Monk
on Oct 19, 2011 at 17:53 UTC ( #932455=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I trying to write a script with the following goals:
  • create child processes
  • under normal situation, wait for all child process to finish
  • but when ctrl-c is pressed, I want it to show how many children are still running

    Below is the script that I currently have. The problem is that it does not work. Without even pressing ctrl-c, sometimes it tells me that some kids finished, and sometimes it does not.
    #!/usr/bin/perl use strict; use POSIX qw(:signal_h :errno_h :sys_wait_h); $SIG{CHLD} = \&REAPER; my (@all_kids, @dead_kids); my @array = qw(a b c d e f g h); for (1 .. 10) { my $pid = fork(); if ($pid) { push (@all_kids, $pid); } elsif ($pid == 0) { print "@array\n\n"; sleep 5; exit (0); } else { die "Could not fork: $!\n"; } print "Running another child process - $pid.\n"; } print "All child processes have been ran.\n"; print "@all_kids\n"; $SIG{INT} = \&tsktsk; foreach (@all_kids) { waitpid ($_, 0); } print "All child processes are finished.\n"; sub tsktsk { print "Ctrl-C Trap\n"; my %alive_kids; for my $i (@all_kids, @dead_kids) { $alive_kids{$i}++; } for (sort keys %alive_kids) { print "$_\n" if ( $alive_kids{$_} == 1 ); } exit; } sub REAPER { my $pid; $pid = waitpid(-1, &WNOHANG); if ($pid == -1) { } elsif ( WIFEXITED($?) ) { print "Process $pid exited.\n"; push( @dead_kids, $pid ) } else { print "False alarm on $pid.\n"; } $SIG{CHLD} = \&REAPER; }
  • Replies are listed 'Best First'.
    Re: Monitoring Child Process
    by Marshall (Abbot) on Oct 19, 2011 at 19:42 UTC
      First, the reaping code has a couple of flaws. (1) there are multiple places where you do a waitpid(), (2) the reaper routine can miss some SIGCHLD's., (3) no need to set $SIG{CHLD} within REAPER

      So I would suggest:

      sub REAPER { my $pid; while ($pid = waitpid(-1, WNOHANG) > 0) { print "Process $pid exited.\n"; push @dead_kids, $pid; } }
      You are not guaranteed to get a separate SIGCHLD for each child who exits! Basically interpret SIGCHLD to mean that one or more children have exited. Use a while loop to service everybody who is ready within the signal handler. There is not going to be a "false alarm", so I took that stuff out.

      Also, to "reap" a child basically means to read its exit status from the process table. The function that does this read is waitpid(). When you have this in two places, somebody may get reaped and yet not go through the reaper. I recommend letting REAPER do all of the reaping and do not call waitpid() anywhere else. Maybe this issue is why you had the "false alarm" code, a race condition? Anyway, only reap in one place and you will not miss any and won't wind up in the signal handler with somebody mysteriously disappearing!

      Update: temporarily don't have access to a unix box, so can't test, but here is a suggestion..

      1. I see why you had the waitpid in the parent (so see if all children have finished). You need another test, perhaps, (@dead_kids < 10) with some wait delay. I think this test will be ok in Perl without further adieu. In C I would need to protect this critical section with some procmask() voodoo. But I think that due to the way Perl >=5.8 delivers deferred signals, that this is not necessary to prevent the REAPER and parent main program from tripping over each other.

        Thank you for your suggestion. I incorporated your suggestion by commenting this out... so that there is only one waitpid in the whole script.
        foreach (@all_kids) { waitpid ($_, 0); }
        And using the REAPER subroutine that you gave. But now two things came up. Under normal condition (without pressing ctrl-c), the parent script exits without 'waiting for the children to finish'.

        And when I try to put a 'sleep 20' and the end of the main script, this is the output that I am getting.
        Process 1 exited. Process 1 exited.
          Oh, I guess a mis-communication. I updated my post with a more detailed suggestion. This foreach(@all_kids) code is that part should be deleted!

          The REAPER is part of the parent. So having waitpid() in only one place means having it only in REAPER().

          Some replacement code to this foreach (@all_kids) loop:

          while (@dead_kids <10) { sleep(1); } print "all kids dead .. @dead_kids\n";
          You can also just sleep(20) and print @dead_kids. The focus right now should be on getting this to work without the CTL-C complication and then add that later.

          Update: Oh, I would also add use warnings; either by that statement, or a -w in the hash bang line. This has nothing to do with your current woes, but there are run time checks with warnings enabled that are useful. I leave them on unless some rare, very rare performance or other reason indicates otherwise.

          Another Update: Was able to test some code...for some reason, when SIGCHLD happens, this causes the sleep to end. I don't know why. So there is a loop to restart the sleep every 1 seconds. try this code...have to run to an appointment...oh, exit 1 was caused by missing parens in while statement in the reaper. The sleep issue is the real puzzle here.

          update: added readmore tag - updated code in later post

    Re: Monitoring Child Process
    by runrig (Abbot) on Oct 19, 2011 at 18:42 UTC
      It would probably be better to keep your kid pids as keys in a hash, then delete them as they are reaped. You are currently printing all of the pids that were created on ctl-C.
        I'm not sure what you mean, but I incorporated your suggestion and it didn't solve the problem.

        When I run the script (without pressing ctrl-c), it will say something like this at the end...
        Process 2555 exited. Process 2558 exited.
        The problem is that I know the script create processes 2556 and 2557 and the REAPER subroutine somehow did not catch it and in reality these children have exited already.

    Log In?

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: perlquestion [id://932455]
    Approved by BrowserUk
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (3)
    As of 2018-05-25 00:36 GMT
    Find Nodes?
      Voting Booth?