Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: Once AGAIN perl saved my bacon

by sierpinski (Chaplain)
on Sep 08, 2009 at 14:25 UTC ( #794146=note: print w/replies, xml ) Need Help??

in reply to Once AGAIN perl saved my bacon

We have multi-million dollar applications running on Solaris servers, and one day one of the systems went haywire. Normally it has something like 16 CPUs with 128 cores and 96GB of ram. Half of the system boards died (or so we thought) and the system just screeched to a halt. So much swapping and icsw's almost halted processing totally. We got the parts replaced and the system back up, but nowhere near soon enough to avoid hefty (in the millions) fines from the government for not having this data available.

It turns out several of the CPUs had gone offline before and we had only lost 1 system board to cause our issue. If we had replaced the failures as they occurred, it would have never brought the system down as bad as it did. We didn't have any monitoring in place to detect failed components, but now we do. I wrote this massive monitoring script (in Perl of course) that uses the Expect module to connect to each server, run a battery of checks, and then emails a report to our group twice a day. Now we find and can fix these minor problems before they escalate into major ones, and several of the upper level executives have been briefed on my work. Still a work in progress, they are always finding new things for me to check!
  /\/\ Sierpinski

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://794146]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2017-12-11 12:32 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (290 votes). Check out past polls.