http://www.perlmonks.org?node_id=560967

jimbus has asked for the wisdom of the Perl Monks concerning the following question:

I'm pretty sure I can figure out what packages to use by searching cpan... what I'm really looking for is advise, best practices and such, so I can get it right the first time. There's a bit of pressure.

A county worker apparent didn't call before he/she dug and took out a whole bunch of our fiber and took down voice and data for Oklahoma and parts of Texas... which understandably made the Veep of networks a bit cranky, especially since our expensive alarming system never paged us.

So the idea is to set up a dedicated server that will ping all of our network elements and send SNMP traps to our alarming server when there is an issue. I'm currently reading about Net::Ping and Net::SNMP and should be able to figure something out on this. But the problem is I really don't have any experience with monitoring networks. What I need is advise from people who have done something like this.

  • Do I just set up a loop with a sleep in it and just ping each machine?
  • Do I send a trap each time I get a failed ping or should there be a so-many second tolerance?
  • Are the Modules listed the best for the job?
  • Are there any patterns or best practices or tutorials on this subject I should be checking into?

    Thanks


    --Jimbus aka Jim Babcock
    Wireless Data Engineer and Geek Wannabe
    jim-dot-babcock-at-usa-dot-com
    • Comment on Pinging network devices and setting SNMP traps
  • Replies are listed 'Best First'.
    Re: Pinging network devices and setting SNMP traps
    by McDarren (Abbot) on Jul 13, 2006 at 15:00 UTC
      If all you want to do is ping some hosts on a regular basis, then sure it'll be easy to whip something up fairly quickly. But if you want a decent Network Monitoring System, then you're better off going for something out of the box. My personal favourite is Big Brother. It's open source (although there is a commercial version available), well supported and dead easy to get up and running quickly. As well as all the standard NM tests that come with it (including ping), it's fairly trivial to write your own customised plug-ins (in Perl, of course :D). I'm currently using it to monitor around 600 separate hosts worldwide, and it works like a dream :)

      Nagios is another Open Source NMS that is highly considered. I've not used it, but I do understand that it is a cow to get configured.

      Hope this helps,
      Darren :)

    Re: Pinging network devices and setting SNMP traps
    by Herkum (Parson) on Jul 13, 2006 at 14:49 UTC

      There are two problems with ping,

      1. On some networks they are filtered; so you cannot get a response
      2. On a busy network it can dropped in favor of higher priority traffic, it is annoying because it can generate numerous false positives

      That being said, I would start implementing some basic monitoring using ping or a ssh/telnet session to check running processes on a box. I have looked at SNMP and I find that it is overly complicated and was not really worth the time invested; It was easier to just capture the output from a ps -ef command and parse that to see what was going on.

        On some networks they are filtered; so you cannot get a response

        ICMP filtering is not as common-place as it was say 2-3 years ago. There was one particular worm (can't recall exactly which one.. MSBlaster, perhaps?) that used an ICMP probe to search for vulnerable hosts. It caused havoc for a short time, and many ISP's started filtering ICMP in response. The problem with ICMP filtering is that it can have unwanted side-effects, such as breaking Path MTU Discovery - and so many of these filters have gradually been removed.

        In my experience, most ISP's will will remove an ICMP filter, or at least allow it for a specific host - if asked nicely.

        Cheers,
        Darren :)

          most ISP's will will remove an ICMP filter

          That is assuming that you are working with an ISP; large business tend to have their own network staff.

          God forbid they don't have anyone looking over them, if that is the case you will end up with a monster because most of them will implement any security idea they read in magazine. They end up making the whole thing so complicated they don't even know how it all works; it is like spaghetti code with network devices.

          I am not fond of our networking staff, can you tell... :)

    Re: Pinging network devices and setting SNMP traps
    by sgifford (Prior) on Jul 13, 2006 at 16:54 UTC
      A few random thoughts:
      • Think about the situation where the network fails on the monitoring machine, so all hosts appear to be off the network, when really just that host is off the network.
      • Consider also what happens if the network connection is down between the monitoring host and the alarm host.
      • Think about what to do when something goes down that causes a lot of other things to go down. For example, if your fiber link gets cut, you don't want dozens of messages every 5 seconds saying that each of the hosts is down, unless you have something in place to prevent that from being extremely annoying.
      • Definitely don't sound any kind of alarm for just one lost packet. Packet loss rates of 1% are fairly normal, and higher loss rates are normal over the open internet or over wireless networks.
      • Think about what will happen if your script crashes. One possibility is to start it up from init, which will restart it if it dies. daemontools is another useful tool to keep your daemons running.