Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

by biosysadmin (Deacon)
on May 06, 2004 at 18:22 UTC ( #351245=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info Tex Thompson <>
Description: This is a quick program that I wrote for a friend who had some flakiness on his server. It parses the output of `ps ax`, checks for vital processes, and restarts any process that are stopped.
#!/usr/bin/perl -w

use strict;
use MIME::Lite;

# - a program for monitoring critical processes on a Unix s
# Copyright 2004, Tex Thompson <>

my $emergency_email = '';
my $time_to_sleep   = 60;
my %vital_processes = (
   'httpd'           => '/usr/local/apache/bin/apachectl start',
   'pop-before-smtp' => '/etc/init.d/pop-before-smtp.init start',
   'postfix'         => '/etc/init.d/postfix start',
   'mysql'           => '/etc/init.d/mysql start',
   'ssh'             => '/etc/init.d/sshd start',
   'syslog'          => '/etc/init.d/syslog start'

while (1) {
   my @process_listing = `ps ax`;
   foreach my $process ( keys %vital_processes ) {
      my @running = grep /$process/, @process_listing;
      my $num_processes = scalar @running;

      if ( $num_processes == 0 ) {
         # try to fix the problem
         my $time = localtime();
         print "Process $process not found at $time!\n";
         print "Executing command ",$vital_processes{ $process },"\n";
         my $command = $vital_processes{ $process };
         my $output  = `$command`;

         # send a notification e-mail
         my $data = "$process not running at $time!\n";
         $data   .= "Tried to restart with command\n$command\n";
         $data   .= "Output:\n$output\n";

         my $msg = MIME::Lite->build(
            From => '',
            To   => $emergency_email,
            Subject => "Emergency: $process down",
            Type => 'TEXT',
            Data => $data
   sleep( $time_to_sleep );
Replies are listed 'Best First'.
by Juerd (Abbot) on May 06, 2004 at 20:10 UTC

    perl -e'sleep' httpd And your code will not detect that httpd is gone.

    Besides that, your code only notices when a process is gone. Far more often, in my experience, processes are still there, but don't work properly. So instead, test the functionality. For example, I use this hack to restart my apache when needed:

    #!/usr/bin/perl use strict; use LWP::Simple; exit 0 if -e "/etc/nouptest"; eval { local $SIG{ALRM} = sub { die "Alarm\n" }; alarm 10; my $p = get ''; $p =~ /xyzzy/ or die "Down\n"; alarm 0; }; if ($@) { if ($@ =~ /Alarm|Down/) { system qw[/etc/init.d/apache stop]; sleep 3; system qw[killall -9 apache]; sleep 3; system qw[/etc/init.d/apache start]; } }
    Cron runs this every minute and it arranges for me to get mail when Apache was restarted (because the init scripts have output). As a nice side effect, this way I get lots of mail when the nameserver is broken :)

    Juerd # { site => '', plp_site => '', do_not_use => 'spamtrap' }

      Thanks for the tips. I'd definitely like to test on a more accurate basis, but the problem was specific enough for this to work. At least, it has worked so far. :)

      A good idea might be to anchor the regex to match at the beginning and end of the line, this would lessen the non-specific matching problem that you mention.

      I was actually thinking of writing Nagios plugins to test all of these services, but that's a task for another day, while this was simply a half hour of scripting.

by Roger (Parson) on May 07, 2004 at 17:02 UTC
by dpavlin (Friar) on May 15, 2004 at 21:19 UTC
    ps-watcher might be another interesting alternative for this problem. It's very configurable (and perl too :-)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://351245]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2021-10-25 04:54 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (89 votes). Check out past polls.