http://www.perlmonks.org?node_id=688677

5mi11er has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I find myself spinning at the moment. I want to create a nice report about ~40 servers that includes ip address usage, memory free/avail, drive size, and partition size/used/free information.

To this end I have a quick and dirty little shell script, along with some sudoers entries spread across the servers such that I can kick off a 'go gather the info' script and have it finish less than 1 minute later.

Already I hear the cries of "Why are you rolling your own? MRTG, RRDTool, Nagios, etc. do that and more for you!" You're absolutely right, and I'm heading down that path already. However, that doesn't teach me any cool new skills that could come in handy down the road.

So, my question is this. How does one parse this type of data in a way that is readable, maintainable, and extensible? A sample of my datafile:

-------------------------------------------------------------
[jboss-box1]
$ echo `hostname -i; hostname -f`
10.1.9.183 jboss-box1.domain.com
$ /sbin/ifconfig | grep -E '(encap|addr)'
eth0      Link encap:Ethernet  HWaddr 00:0C:29:3D:6B:CF  
          inet addr:10.1.9.183  Bcast:10.1.9.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe3d:6bcf/64 Scope:Link
          Interrupt:177 Base address:0x1424 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host

$ uname -a
Linux jboss-box1 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 2008 i686 i686 i386 GNU/Linux

$ cat /etc/redhat-release
CentOS release 4.6 (Final)

$ free -b
             total       used       free     shared    buffers     cached
Mem:    1059393536 1035771904   23621632          0   92483584  718749696
-/+ buffers/cache:  224538624  834854912
Swap:   2146754560          0 2146754560

$ sudo /sbin/fdisk -l | grep : 
Disk /dev/sda: 42.9 GB, 42949672960 bytes

$ df -aP -t ext2 -t ext3
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda2             36638508   2750768  32026612       8% /
/dev/sda1               505604     22994    456506       5% /boot
/dev/sda3              2063536   1291852    666860      66% /var/log

-------------------------------------------------------------
[jboss-box2]
$ echo `hostname -i; hostname -f`
10.1.9.182 jboss-box2.domain.com

$ /sbin/ifconfig | grep -E '(encap|addr)'
eth0      Link encap:Ethernet  HWaddr 00:0C:29:D6:56:4C  
          inet addr:10.1.9.182  Bcast:10.1.9.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fed6:564c/64 Scope:Link
          Interrupt:177 Base address:0x1424 
eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:D6:56:4C  
          inet addr:10.1.9.187  Bcast:10.1.9.255  Mask:255.255.255.0
          Interrupt:177 Base address:0x1424 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host

$ uname -a
Linux jboss-box2 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 2008 i686 i686 i386 GNU/Linux

$ cat /etc/redhat-release
CentOS release 4.6 (Final)

$ free -b
             total       used       free     shared    buffers     cached
Mem:    1059393536 1037778944   21614592          0   88453120  686907392
-/+ buffers/cache:  262418432  796975104
Swap:   2146754560     212992 2146541568

$ sudo /sbin/fdisk -l | grep : 
Disk /dev/sda: 42.9 GB, 42949672960 bytes

$ df -aP -t ext2 -t ext3
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda2             36638508   2789720  31987660       9% /
/dev/sda1               505604     22994    456506       5% /boot
/dev/sda3              2063536   1196952    761760      62% /var/log

-------------------------------------------------------------
[db-box1]
$ echo `hostname -i; hostname -f`
10.1.9.181 db-box1.domain.com

$ /sbin/ifconfig | grep -E '(encap|addr)'
eth0      Link encap:Ethernet  HWaddr 00:0C:29:98:96:41  
          inet addr:10.1.9.181  Bcast:10.1.9.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe98:9641/64 Scope:Link
          Interrupt:177 Base address:0x1424 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host

$ uname -a
Linux db-box1 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 2008 i686 i686 i386 GNU/Linux

$ cat /etc/redhat-release
CentOS release 4.6 (Final)

$ free -b
             total       used       free     shared    buffers     cached
Mem:    2124685312 2087546880   37138432          0   41418752 1901887488
-/+ buffers/cache:  144240640 1980444672
Swap:   2146754560  430018560 1716736000

$ sudo /sbin/fdisk -l | grep : 
Disk /dev/sda: 107.3 GB, 107374182400 bytes

$ df -aP -t ext2 -t ext3
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda2             98564748  28245568  65312124      31% /
/dev/sda1               505604     22991    456509       5% /boot
/dev/sda3              2063536     42116   1916596       3% /var/log
I've done lots of simple parsing before, but rolling my own here seems to be a lot of work and the wrong approach. I've searched for parsing examples, but thus far, they are just beyond my grasp for me to understand how to bend them to work with this type of data. Little utilities each parsing one set of information is how the "tools" do it, but that doesn't help me learn more about parsing larger things such as this.

Can any of the parsing guru's help me get up to speed?

Thank you,

-Scott

Replies are listed 'Best First'.
Re: parsing system info
by grep (Monsignor) on May 27, 2008 at 16:22 UTC
    My 2 hints:

    UPDATE: minor updates

    grep
    One dead unjugged rabbit fish later...
Re: parsing system info
by moritz (Cardinal) on May 27, 2008 at 16:26 UTC
    $ echo `hostname -i; hostname -f` 10.1.9.183 jboss-box1.domain.com

    Either run them separately, or split the output at whitespaces.

    eth0 Link encap:Ethernet HWaddr 00:0C:29:3D:6B:CF inet addr:10.1.9.183 Bcast:10.1.9.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe3d:6bcf/64 Scope:Link Interrupt:177 Base address:0x1424 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host

    The interface name is simply m/^(\w+)/, What other informations do you want to extract?

    $ uname -a Linux jboss-box1 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 200 +8 i686 i686 i386 GNU/Linux

    Run uname several time with more specific options

    And so on - there's not one all-empowering parsing paradigm - just chose what you see fit. For fixed width records you could use unpack btw - see perlpacktut.

Re: parsing system info
by NetWallah (Canon) on May 27, 2008 at 17:33 UTC
Re: parsing system info
by Herkum (Parson) on May 27, 2008 at 17:11 UTC

    I have found that one of the most useful skills you can develop as a programmer is learning how to make it easier to manage data rather than parse data.

    You should focus on trying to find the information you need in as simple and as focused context as possible. In other words, only get what you need and your programming will be much easier.

    If you dump everything out and then try to parse, you might get what you need but you have to sort through a bunch of garbage to find what you need (you get it: Garbage, Dump?! its funny!... *sigh*).

Re: parsing system info
by 5mi11er (Deacon) on May 27, 2008 at 16:54 UTC
    Excellent suggestion to use the /proc system grep.

    But as a training exercise for me, (I wasn't very clear initially), I'm looking for how to parse the different areas. Pseudo-code time:

    while(<>) {
       # search for lots of dashes, that separates the server records
           #once we have found that, we're at step 1
       # Step 1
           # gather server name/main ip address
           # if we find "ifconfig" go to step 2
       # Step 2
           # gather IP address info
           # if we find "uname" to to step 3
       # Etc...
    }
    
    That's my current, probably bad, idea of how to tackle the problem. I'm pretty sure there are much better ways of going about it...

    -Scott

      Little utilities each parsing one set of information is how the "tools" do it, but that doesn't help me learn more about parsing larger things such as this.

      I'd say develop little utilities each parsing one set of information.

      You are planning a big utility that opens your data file once and reads everything. Instead, let each little utility open the file, scan through for what it needs, and ignore everything it doesn't know about. That way you can build and test one little piece at a time.

      After you've written several little utilities, you'll find they all use some of the same tricks, e.g. the regex for reading the server name. You will want to put these repeated snippets into a module that each little utility can use. Do that module later. Start by building some little thing that works right now.

      For example, the df request returns a table with column headings. Have a little utility that handles this.

      # regexes referenced below should be defined or imported here # hash of hashes summarizing df results my %partitions_on; LINE: while (<>) { # pull the server name from a line like [jboss-box1] my ( $server ) = /$SERVER_PATTERN/; next LINE if !$server; # skip what doesn't relate to the df command NON_DF_INFO: while (<>) { last NON_DF_INFO if /$DF_PATTERN/; } my $header = <>; my @cols = split; my %value_of; DATA: while (<>) { last DATA if /$BLANK/; @value_of{@cols} = split; } %partitions_on{$server} = \%value_of; # go back to hunting for another server next LINE; }
Re: parsing system info
by andreas1234567 (Vicar) on May 27, 2008 at 18:58 UTC
    I suggest you take a good look at cfengine while you're at it. The article Intro to cfengine for system administration (ibm.com/developerworks) is from 2002 but it still stands.
    --
    No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]
Re: parsing system info
by bloonix (Monk) on May 28, 2008 at 10:45 UTC
    Hello Scott,

    maybe you like to use Sys::Statistics::Linux for different things like memory and disk usage.

    There are many other Linux::* modules on CPAN that could be very useful for you.

    Cheers,
    Jonny
Re: parsing system info
by 5mi11er (Deacon) on May 30, 2008 at 14:34 UTC
    Thanks for the various suggestions for how to better gather system information, it's useful information.

    But I was attempting to learn about parsing best practices. I'll eventually start up a new thread attempting to ask the question better.

    -Scott