http://www.perlmonks.org?node_id=351245
Category: Utility Scripts
Author/Contact Info Tex Thompson <tex@biosysadmin.com>
Description: This is a quick program that I wrote for a friend who had some flakiness on his server. It parses the output of `ps ax`, checks for vital processes, and restarts any process that are stopped.
#!/usr/bin/perl -w

use strict;
use MIME::Lite;

# monitor.pl - a program for monitoring critical processes on a Unix s
+erver
# Copyright 2004, Tex Thompson <tex@biosysadmin.com>

my $emergency_email = 'contact@mail.com';
my $time_to_sleep   = 60;
my %vital_processes = (
   'httpd'           => '/usr/local/apache/bin/apachectl start',
   'pop-before-smtp' => '/etc/init.d/pop-before-smtp.init start',
   'postfix'         => '/etc/init.d/postfix start',
   'mysql'           => '/etc/init.d/mysql start',
   'ssh'             => '/etc/init.d/sshd start',
   'syslog'          => '/etc/init.d/syslog start'
);


while (1) {
   my @process_listing = `ps ax`;
   foreach my $process ( keys %vital_processes ) {
      my @running = grep /$process/, @process_listing;
      my $num_processes = scalar @running;

      if ( $num_processes == 0 ) {
         # try to fix the problem
         my $time = localtime();
         print "Process $process not found at $time!\n";
         print "Executing command ",$vital_processes{ $process },"\n";
         my $command = $vital_processes{ $process };
         my $output  = `$command`;

         # send a notification e-mail
         my $data = "$process not running at $time!\n";
         $data   .= "Tried to restart with command\n$command\n";
         $data   .= "Output:\n$output\n";

         my $msg = MIME::Lite->build(
            From => 'root@biosysadmin.com',
            To   => $emergency_email,
            Subject => "Emergency: $process down",
            Type => 'TEXT',
            Data => $data
         );
         $msg->send;
      }
   }
   sleep( $time_to_sleep );
}
Replies are listed 'Best First'.
Re: intelli-monitor.pl
by Juerd (Abbot) on May 06, 2004 at 20:10 UTC

    perl -e'sleep' httpd And your code will not detect that httpd is gone.

    Besides that, your code only notices when a process is gone. Far more often, in my experience, processes are still there, but don't work properly. So instead, test the functionality. For example, I use this hack to restart my apache when needed:

    #!/usr/bin/perl use strict; use LWP::Simple; exit 0 if -e "/etc/nouptest"; eval { local $SIG{ALRM} = sub { die "Alarm\n" }; alarm 10; my $p = get 'http://uptest.convolution.nl/'; $p =~ /xyzzy/ or die "Down\n"; alarm 0; }; if ($@) { if ($@ =~ /Alarm|Down/) { system qw[/etc/init.d/apache stop]; sleep 3; system qw[killall -9 apache]; sleep 3; system qw[/etc/init.d/apache start]; } }
    Cron runs this every minute and it arranges for me to get mail when Apache was restarted (because the init scripts have output). As a nice side effect, this way I get lots of mail when the nameserver is broken :)

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Thanks for the tips. I'd definitely like to test on a more accurate basis, but the problem was specific enough for this to work. At least, it has worked so far. :)

      A good idea might be to anchor the regex to match at the beginning and end of the line, this would lessen the non-specific matching problem that you mention.

      I was actually thinking of writing Nagios plugins to test all of these services, but that's a task for another day, while this was simply a half hour of scripting.

Re: intelli-monitor.pl
by Roger (Parson) on May 07, 2004 at 17:02 UTC
Re: intelli-monitor.pl
by dpavlin (Friar) on May 15, 2004 at 21:19 UTC
    ps-watcher might be another interesting alternative for this problem. It's very configurable (and perl too :-)
    2share!2flame...