Re: Monolith: A Clever Tool For Monitoring Regularly Scheduled Tasks

Someone asked in a different thread:

This does sound like something that would be good for monitoring automated scripts and processes that now send emails where I work. Could you expand on how this system differs from Nagios and related tools? Nagios uses (perhaps completely custom) scripts and tools to provide a status, and am pretty sure has the ability to store historical data in MySQL. It's default display also looks similar to your display board, with indicators of green/yellow/red. Understand, I'm not trying to be one of those people saying "why did you do this when you could have used X", I'm trying to think how your system differs, so that if I can get time to do an implementation at my own work, I don't end up recreating Nagios (badly).

We looked at Nagios initially, and were convinced pretty quickly that it was unmanagably obtuse. What kind of sealed it for us was a flowchart we found, part of the Nagios documentation, that explained the rats nest of configuration files that needed to be tweaked in order to accomplish even the most basic monitoring tasks. Nagios may have been a good solution when it first came out, but...by virtue of trying to be all things to all people, it seems to have grown to the point where it ceases to be effective at its core task. Nagios has become the iTunes of monitoring. Sometimes you just want to play a song, not manage your iPad firmware and shop for gift cards.

Monolith trumps Nagios in several areas. First and foremost is ease of deployment. Suppose I have a script that's being called by cron somewhere. All I need to do is add a single line to that script, and that's it. The call-home script takes care of informing Monolith that it should be watched. Usually, you want to place this single call-home line at the end of your script, or at the point in the script where operational success versus operational failure is determined. More on that in a moment. Literally, all you do is add one line:

system("/usr/local/bin/monolith.pl myscript");
[download]

When invoked, monolith.pl looks to see what the local hostname is where it's running. It uses this in conjunction with the argument you supply ("myscript" in this case) to check to see if it has called home before. If it hasn't, it adds "myscript on {hostname}" to the list of entities who's "drumbeat" is to be monitored. If this database already has mentions of "myscript on {hostname}", it simply adds a new row in the table saying "Hi, i'm myscript on {hostname}.. Just checking in.. It's currently {time} right now." ....And that's it. As I described above, the dashboard piece of the solution looks at this table, and by virtue of the track record being created by a script calling home repeatedly, can deduce when the script is noticably overdue. It's like a parent with a kid in college; they expect their child to call home on sunday nights...they've called home every sunday night at 8:00 PM for the past 6 months....8 PM sunday rolls around, and the phone doesn't ring.. After about 8:30, the parents become concerned. After 10PM, they get worried and start thinking something's wrong.. Monolith works on the same premise. It looks at who's calling home, and how frequently they do it....and if the thing calling home strays far enough from that established pattern, it throws a notification that there's something wrong. (BTW, Monolith will only begin actively monitoring an entity after that entity has called home at least 4 or 5 times, so that a reliable call-home frequency can be calculated.)

This is the second area where Monolith trumps Nagios; The model/method of monitoring; In Monolith, the process of monitoring entities is no longer reliant upon a given script's ability to inform you of its own status. It is deductive, versus reactive. In a reactive model, you can't always guarantee that the thing responsible for communicating it's status will do so, or be capable of doing so. In a deductive model, you can determine whether something is running successfully or not completely independently of the condition of the network, the host, or the script itself. Nagios won't be able to help you much if the thing responsible for reporting is unable to call home for a variety of reasons... network outage, broken modules/libraries, unforseen conditions, bugs.. These sort of things potentially stand in the way of the script notifying you of trouble. By moving the point of responsibility up the chain, the script is alleviated from having to do any communication whatsoever to communicate its status to the user.

Anything which can be expressed as a Good/Bad, On/Off, Up/Down, Present/Not Present, Success/Failure state can be conveyed in Monolith simply by instructing a script to call home on in positive conditions, and not calling home in negative conditions. When the script stops calling home, Monolith notices it, and informs you.

In my experience, deductive monitoring is way, way better than reactive monitoring. Nagios, at least as far as I understand it, is incapable of anything other than reactive monitoring; It can only tell you about information it receives, not information that it has deduced on its own.

(Fun side note: I read a book recently, written by a guy named Bill Bruford, the drummer for "Yes" from 1969-1972 or so. His approach to drumming is kind of the same approach that Monolith's takes toward monitoring. Bruford considers drumming as the management the time inbetween drum beats, rather than the execution of the drum beats themselves.. It's sort of an inverse view of the same activity, and one that opens the door to all sorts of different creative possibilities.)

Comment on Re: Monolith: A Clever Tool For Monitoring Regularly Scheduled Tasks Download Code


Syntactic Confectionery Delight
	PerlMonks