Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Keeping children alive persistiently, intelligently

by Hercynium (Hermit)
on Feb 08, 2008 at 19:30 UTC ( #667049=perlquestion: print w/replies, xml ) Need Help??
Hercynium has asked for the wisdom of the Perl Monks concerning the following question:

Magnanimous Monks, myriad musings mottle my moody mind. Maybe many mighty mentors might minister me memorable motions mending my malady? (I've been waiting to post that for months - alliteration is dorky but *fun!*) :)

OK, here's what I came for:

I have a program that forks off a pile of long-running children. Actually, the child processes should probably *never* return or die, in normal operation.

The code I've already written (simplified version below in the readmore section) simply reports info about dead children to a log, but now I need to do more...

The calls to fork are done inside a loop that looks like this:
my %child_info; foreach my $params (@params_for_children) { if ( my $pid = fork() ) { $child_info{$pid} = $params; } else { do_childish_things($params); croak "Child fork exited - should never happen\n"; } } while ( my $pid = wait() ) { last if $pid = -1; my $exit_status = $?; report_dead_child($pid, $exit_status, $child_info{$pid}); }

Generally, this works very well, but there are some things I need to add and I'm somewhat uncertain about how I would best go about it.

SO... I need advice with the following requirements:
  • If a child dies, a new one needs to be spawned, with the same parameters as it's predecessor, but not immediately - there should be about a 1 min delay.
  • If a child dies twice within a short period (say 5 min) it should be restarted after a longer (say 10 min) delay.
  • If a child dies X number of times, the parent should do some pre-defined action (like emailing me) and give up on that child.
  • These delays should not cause the parent to miss opportunities to revive other dead children
  • If the parent is terminated, all the children need to die with it.

My main question is this:

Is the code above a bad way for me to do this?
Should I continue writing my own solution(s) or is there a robust CPAN module that will do this for me with less hassle?

Replies are listed 'Best First'.
Re: Keeping children alive persistiently, intelligently
by dynamo (Chaplain) on Feb 08, 2008 at 19:45 UTC
    Ok.. well, first off you'll need to save some (persistent) state. You could do this in files, variables in the parent process, environment variables, whatever.

    The first thing that comes to mind is to start off just making a folder for the info, and have each child create a log file named after it's pid, with it's arguments, a log of recent restarts (say, the last 5 min,).

    I'd also have each process touch it's pid file every 10 seconds or so, so the parent which is polling the data can know that a pid file not touched in the last 30 seconds needs to have it's child killed and restarted a minute or so later.

    Once you have the logic in for restarted dead/stopped children, it's just a matter of adding conditionals and changing the sleep time for the 5 minute stop. Similarly the custom processes will just take an additional conditional checking on the restart history.

    Hope that helps,

    - D

Re: Keeping children alive persistiently, intelligently
by kyle (Abbot) on Feb 08, 2008 at 20:13 UTC

    This sounds like a job for POE, but I don't know enough about that module to say for sure.

    If I were writing this, I'd be strongly tempted to make each child into an object. You could stash within it $params and all the other state that you're trying to hold and modularize all the behavior you want them to have. Each child object in the parent would know the PID of the child process, the parameters it started with, how many times it has died, etc.

    • Give the child object a DESTROY method that kills the child process. This way when the parent exits, the children go with it.
    • The parent can still wait for something to die and tell the relevant child to restart itself.
    • The child can fork and then sleep to get the delay you want.

    Something like:

    my @child_info; foreach my $params (@params_for_children) { my $child = Foo::Child->new( params => $params ); $child->spawn(); push @child_info, $child; } while ( my $pid = wait() ) { last if $pid = -1; my $exit_status = $?; my ($poor_dead_child) = grep { $_->{pid} == $pid } @child_info; $poor_dead_child->fyi_you_died(); $poor_dead_child->spawn(); } package Foo::Child; sub new { my $class = shift; my $self = { @_ }; return bless $self, $class; } sub DESTROY { kill 'TERM' => shift->{pid} } sub fyi_you_died { shift->{death_toll}++ } sub spawn { my $self = shift; my $pid = fork; die "Can't fork: $!" if ! defined $pid; if ( $pid ) { $self->{pid} = $pid; return; } else { sleep ... if $self->{death_toll} > ...; do_child_stuff( $self->{params} ); die; } }

    This is just a sketch, but hopefully you get the idea. Having written all this, I'm now guessing that someone will come along with a much better CPAN module I've never heard of.

    Update: Upon further consideration, I'm not sure this is such a hot idea. Each child is a copy of the whole, so each has a copy of all the child objects. As soon as one of them dies, it's going to shoot all the other ones in their destructors. Oops. You could still have them all manage themselves except for the DESTROY methods. In that case, the parent would have to kill them all manually in an END {} block.

    Update 2: Another thought. You could write DESTROY this way:

    sub DESTROY { my $self = shift; if ( $$ == $self->{parent_pid} ) { kill 'TERM' => $self->{pid}; } }

    Then you have spawn note the parent PID before it forks

    sub spawn { my $self = shift; $self->{parent_pid} = $$; my $pid = fork; # ...
      Your OO approach appeals to me - I've been on somewhat of an OO-FP kick lately - and in your code I think I see a potentially clean, maintainable way to implement the required features.

      BTW, regarding memory worries: Look for my response to zentara. One of the reasons I'm using fork() is because the OS can do some wonderful tricks with it to keep memory usage down.

      I appreciate all the thought you've put into this. :)
Re: Keeping children alive persistiently, intelligently
by zentara (Archbishop) on Feb 08, 2008 at 19:53 UTC
    Fellow friar, frequent flights of fancy flip my fallow folds. But I would consider using threads. Why? It seems you need alot of interprocess communication, and threads make it easy. You can setup alot of shared variables and a loop or timer mechanism to track them. You can have a set of shared variables in each thread to store it's current parameters, so you can restart it. You can reuse threads too. In a threaded app, if any thread runs "exit" it will kill all threads.

    But exactly how you setup your threads depends on things like what are they doing, do you want to reuse them, how many simultaneously running threads do you want? SuperSearch here for threads, or will yield many examples.

    Generally I like to run threads and have the main parent thread just act as a controller( watching, reading shared vars, etc). It is convenient to do it with an event-loop system like POE, Tk, Gtk2, or even GLib (my current commandline favorite).

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
      Well, I almost did do this with threads... but ran into some issues. The primary issue is that some of the code called from do_childish_things() is not thread-safe.

      Another issue (though I'm not *certain* it's really an issue) would have been with memory usage. From what I understand, each thread is a copy of the current interpreter's state, in the same process space as the original invocation. Due to the amount of code that makes up this app, that would be one heck of a big process.

      Now, a fork() is supposed to be the same sort of thing, except that the spawned child has it's own address space... but due to the efficiency of fork() on most *nix platforms, it can use copy-on-write against the parent and thus is much more memory efficient. (Indeed, this seems to be the case from my limited inspection of the current running version)

      Anyhow - there is no shared state or communication between the processes, except that they all write their output to the same file-handle (opened and initialized by the parent), using flock() to avoid stepping on each other's toes. However, all children are running the same code, and the amount of data resident in any particular child is only 10-20KB *at most*. This seems to work *very* well with fork.

      I do appreciate the advice though, but I'm not convinced that using threads would solve any of the requirements I need to fill any easier.
        Non-thread safe code means that it can not be shared between threads, but ot does not mean that it can be used in a detached child thread.
        The memory usage of threaded applications can be as small as forked applications as threads can (and should be) reused if done properly.
        I usally work with queues with threads and detach all child threads right at the beginning. All requests and results get send then to queues. The main thread waits until the results have been received.
        Yeah, the memory issue with threads can be bad, and forking is so clean in that respect. The way to get around it though, is to pre-make a set of empty threads right at the top of the script, and reuse them. That way, there is no overlapping useless code in the thread, and reusing them prevents memory gains. You can then load data to process into each thread thru shared variables. Yeah, it's alot of juggling to do, fork is probably better.

        I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: Keeping children alive persistiently, intelligently
by peterdragon (Beadle) on Feb 09, 2008 at 09:21 UTC
    POE is a good choice for the parent. It makes it easy to write event-driven code that handles state changes and timed events.

    POE is single threaded, however, so you definitely need to fork() any workers that might block. There's some example start/stop code at Re: Perl Background processes in Windows. Remove the setsid() call from that and the children will stay in the parent process group and die when it does. You should also consider what you're doing with Stdin/Stdout/Stderr in the child processes - going to parent's tty, or to log file or /dev/null? Also if you interrupt with CTRL-C you probably need tidy up code.

    Following is an example of using POE to control start/stop. You would need to make it more sophisticated to handle your rules for the list of parameters and the respawn intervals. E.g. stash the date/time of last exit against entries in your list and take account of that in the spawner.

    #!/usr/bin/perl use strict; use warnings; # POE checking/debug levels sub POE::Kernel::ASSERT_DEFAULT { 0 } # DATA, EVENTS, FILES, RETVALS, +USAGE sub POE::Kernel::ASSERT_EVENTS { 1 } sub POE::Kernel::ASSERT_USAGE { 1 } sub POE::Session::ASSERT_STATES { 1 } use POE; use POSIX; my $num_kids = 8; my %kid_pids; main(); END { do_end(); } sub do_end { # FILL HERE exit(0); } sub main { # catch CTRL-C interrupt to tidy up cleanly $SIG{QUIT} = $SIG{INT} = $SIG{HUP} = \&do_end; POE::Session->create ( inline_states => { _start => sub { print "Starting\n"; $_[KERNEL]->sig('INT', 'signal_handler'); $_[KERNEL]->delay(reaper => 1 ); $_[KERNEL]->delay(spawner => 1 ); }, _stop => sub {}, signal_handler => sub { my ($kernel, $sig) = @_[KERNEL, ARG0]; print "caught SIG$sig\n"; do_end(); }, reaper => sub { $_[KERNEL]->delay(reaper => 1); while ( (my $pid = waitpid(-1, POSIX::WNOHANG)) > -1 ) { my $exit_status = $? / 256; delete $kid_pids{$pid}; print "child $pid died status $exit_status\n"; } }, spawner => sub { $_[KERNEL]->delay(spawner => 1); # start new kid if slot free if (scalar keys %kid_pids < $num_kids) { my $pid = start_child(); if ($pid > 0) { $kid_pids{$pid} = 1; print "child $pid started\n"; } } }, }, ); $poe_kernel->run(); } sub start_child { # FILL HERE return $pid; }

    Regards, Peter

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://667049]
Approved by dynamo
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2016-10-01 21:14 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (7 votes). Check out past polls.