http://www.perlmonks.org?node_id=555386

chrism01 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have 2 programs that run as daemons ie 24 x 7 and need to keep doing so, even if MySQL goes away.
Conceptually, this didn't seem too hard at first, but I've discovered that I need to trap perl's internal die signal or it'll break anyway.
Unfortunately, I haven't been able to find a way to do that permanently. All the techniques I've tried have died after about the 3rd time an error occurs.
See code and results below.

#!/usr/bin/perl -w use locale; use DBI; use POSIX; use strict; # Enforce declarations # Declare cfg pkg so we can refer to it anywhere { package cfg; # Database handle $cfg::dbh = ''; $cfg::db_retry_cnt = 0; $cfg::db_cxn_msg_sent = 0; } # Connect to Database db_connect(); if( !$cfg::dbh ) { print "startup connect failed\n"; exit(1); } # Install SIG Handlers $SIG{__DIE__} = \&do_sig_die; # trap & ignore #local $SIG{__DIE__} = \&do_sig_die; # trap & ignore #$SIG{__DIE__} = 'IGNORE'; # trap & ignore #local $SIG{__DIE__} = 'IGNORE'; # trap & ignore # Call main sub to process msgs process_msgs(); #********************************************************************* +********* sub process_msgs { my ( $packet_interval, # interval between keep alive $error_msg # error msg (if any) ); while(1) { # Check for DB errors as we go $error_msg = ""; ($packet_interval, $error_msg) = db_get_pkt_interval_type("xxx\@x.com.au"); if( !$error_msg ) { #DEBUG print "no error caught!\n"; } sleep(5); } } #********************************************************************* +********* sub db_get_pkt_interval_type { my ( $sql, # SQL code $sth, # SQL statement handle $error_msg, # sql error msg $userlogin, # user's login eg xxx@x.com.au $username, $isp, $interval ); # Get input params $userlogin = $_[0]; # Extract ISP name ($username, $isp) = split(/@/, $userlogin); # Set SQL cmd $error_msg = ""; $sql = "SELECT 1 ". "FROM sometable ". "WHERE somecol = ".$cfg::dbh->quote($isp); print "here 1\n"; # Run the select $sth = $cfg::dbh->prepare($sql); print "here 2\n"; $sth->execute(); print "here 3\n"; if ( $sth->errstr ) { print "here 4\n"; # SQL failure; set err msg from DB $error_msg = "db_get_pkt_interval(): ".localtime()." ". $sth->errstr; # Check if real SQL error or DB gone away if( $sth->errstr =~ /MySQL server has gone away/ ) { print "here 5\n"; # Try to re-connect for later # logging done in called sub db_connect(); } else { # Other SQL error, so log print "$error_msg\n"; } } else { # Collect values ( $interval ) = $sth->fetchrow_array; } # Finished txn $sth->finish; # Return user's interval value return($interval, $error_msg); } #********************************************************************* +********* # # Function : db_connect # # Description : Connect to the Database, so we can insert the exchan +ge info # # Some SQL calls specifically check for # /MySQL server has gone away/ before checking for any + other # SQL error. If found, call here, where a count is kep +t of # the num of times this happens. An email & sms is # only sent if retry limit has been reached and no pre +v msg # has been sent. If a successful connection to the DB +is made, # the err msg flag and retry cnt are reset. # # Params : none # # Returns : none dbh stored as global # #********************************************************************* +********* sub db_connect { my ( $dsn, # data source name $error_msg # if any ); # Set up connection string... $dsn = "DBI:mysql:database=somedb;host=somehost;port=nnnn"; # ... and connect # NB: Set RaiseError => 0 so we can trap errors manually # without breaking the program $cfg::dbh = DBI->connect( $dsn, "user", "passwd", {RaiseError => 0, AutoCommit => 1}); if( $DBI::errstr) { # FAIL $error_msg = "db_connect(): $DBI::errstr"; print "$error_msg\n"; $cfg::db_retry_cnt++; if( !$cfg::db_cxn_msg_sent && $cfg::db_retry_cnt > 3 ) { print "would have sent email/sms\n"; $cfg::db_cxn_msg_sent = 1; } } else { # SUCCESS $cfg::db_cxn_msg_sent = 0; $cfg::db_retry_cnt = 0; } } #********************************************************************* +********* # # Function : do_sig_die # # Description : If we get SIGDIE, trap, log, ignore, re-install. # Basically, some internal process keeps breaking this # prog with SIGDIE. We are not using this signal ourse +lves; # sigh ... # # Params : none # # Returns : none # #********************************************************************* +********* sub do_sig_die { # Log it for analysis print "SIGDIE caught\n"; # Defensive programming ... $SIG{__DIE__} = \&do_sig_die; }
Results:
here 1 here 2 here 3 no error caught! here 1 here 2 here 3 DBD::mysql::st fetchrow_array failed: fetch() without execute() at ./p +d.pl line 119. no error caught! here 1 here 2 DBD::mysql::st execute failed: MySQL server has gone away at ./pd.pl l +ine 91. here 3 here 4 here 5 DBI connect('database=ispdata;host=localhost;port=3306','radius',...) +failed: Can't connect to local MySQL server through socket '/var/lib/ +mysql/mysql.sock' (2) at ./pd.pl line 162 db_connect(): Can't connect to local MySQL server through socket '/var +/lib/mysql/mysql.sock' (2) SIGDIE caught Can't call method "quote" on an undefined value at ./pd.pl line 83. prompt$>
I'm also curious that the first error msg is 'fetch() without execute()', yet I don't get an error msg about the preceding execute() failing...
I'd prefer a soln that doesn't involve eval'ing everything, as it's not only messy (lots of this type of code to fix), but also these daemons need to be as fast as reasonably possible.

I've researched some options (as you can see), but they all have more or less the same result.
Also, most articles I've read only intercept the die long enough to do eg print a msg, then they die anyway.
My daemons must not die at all.
I saw a short example about redefining perl's internal die mechanism, but couldn't understand it or at least couldn't make it work: http://www.perlmonks.com/?node_id=554552
The eval option there had the weird effect of not going into the eval block again after the first failure, which is not what I want.
Any simple solns greatly appreciated
Chris

Readmore tags added by GrandFather