We have multi-million dollar applications running on Solaris servers, and one day one of the systems went haywire. Normally it has something like 16 CPUs with 128 cores and 96GB of ram. Half of the system boards died (or so we thought) and the system just screeched to a halt. So much swapping and icsw's almost halted processing totally. We got the parts replaced and the system back up, but nowhere near soon enough to avoid hefty (in the millions) fines from the government for not having this data available.
It turns out several of the CPUs had gone offline before and we had only lost 1 system board to cause our issue. If we had replaced the failures as they occurred, it would have never brought the system down as bad as it did. We didn't have any monitoring in place to detect failed components, but now we do. I wrote this massive monitoring script (in Perl of course) that uses the Expect module to connect to each server, run a battery of checks, and then emails a report to our group twice a day. Now we find and can fix these minor problems before they escalate into major ones, and several of the upper level executives have been briefed on my work. Still a work in progress, they are always finding new things for me to check!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||