Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Resizing MRTG (RRDTool) logs en-masse

by McDarren (Abbot)
on Nov 29, 2010 at 15:34 UTC ( [id://874274]=CUFP: print w/replies, xml ) Need Help??

MRTG is a very widely-used and popular tool for graphing data. Most typically, it's used for graphing bandwidth utilisation, but it can be (and is) used to graph just about anything.

When used together with RRDTool, MRTG will by default create rrdfiles giving approximately 2 days of 5 minute data, 1 week of 30 minute data, 2 months of 2 hour data & 2 years of one day data.

At my $workplace, we decided that we needed to change these defaults, and fortunately recent versions of MRTG provide a suite of RRDRowCount configuration options for this purpose. And RRDTool provides a resize command for resizing existing rrd's.

My issue was that we had more than 2000 existing rrd's, and some of them had already been resized at an earlier stage. So I needed to iterate through the lot and examine them one by one, and resize accordingly.

Perl to the rescue :-)
The below script ran through the whole lot in just a few minutes.

Disclaimer: If you decide to use the below on your own rrd's, I strongly recommend a dry run on a copy of your files first. You have been warned :-p

Update: Made a couple of small changes as per comments from jwkrahn.

Update 2012-01-29: Now on GitHub.

Cheers,
Darren

#!/usr/bin/perl use strict; use warnings; use RRDs; use Time::HiRes qw/time/; my $rrdtool = '/usr/bin/rrdtool'; my $logsdir = 'logs'; # Where all the rrd files live # The number of data sources in each rrd file # Typically, for mtrg-generated rrds this will be 8 my $datasources = 8; my %wanted = ( 1 => 8640, # 30 days of 5 minute data 6 => 17520, # 365 days of 30 min data 24 => 13140, # 3 years of 2 hour data 288 => 3650, # 10 years of 1 day data ); opendir(DIR, $logsdir) or die "Cannot open $logsdir:$!\n"; my @rrds = grep { /.rrd$/ && -f "$logsdir/$_" } readdir DIR; closedir DIR; my $numfiles = scalar @rrds; print "Starting, found $numfiles rrd files\n\n"; my $start = time; for my $rrd (sort @rrds) { print "\nProcessing $rrd\n"; my $info = RRDs::info "$logsdir/$rrd"; # Check to ensure we actually have a valid rrd file unless ($info->{filename}) { print qq|"$logsdir/$rrd" doesn't appear to be a valid rrd log, + skipping\n|; next; } for (0 .. $datasources -1) { my $cmd = qq|$rrdtool resize $logsdir/$rrd |; my $pdp = $info->{"rra[$_].pdp_per_row"}; my $rows = $info->{"rra[$_].rows"}; my $cf = $info->{"rra[$_].cf"}; my $diff = $rows - $wanted{$pdp}; printf("\tCurrent DS => PDP per row:%.f Rows:%.f CF:%s\n", $pd +p, $rows, $cf); if ($diff < 0) { $diff = abs($diff); $cmd .= qq|$_ GROW $diff|; } elsif ($diff > 0) { $cmd .= qq|$_ SHRINK $diff|; } else { print "\tNo change to this DS\n\n"; next; } print "\tResizing to $wanted{$pdp} rows, executing $cmd\n"; system($cmd) == 0 or die "Could not execute $cmd:$!\n"; print "\tRenaming resized file\n"; rename 'resize.rrd', "$logsdir/$rrd"; print "\tDone.\n"; } } my $end = time; my $dur = sprintf("%.2f", $end - $start); print "Finished, processed $numfiles files in $dur seconds\n\n";

Replies are listed 'Best First'.
Re: Resizing MRTG (RRDTool) logs en-masse
by jwkrahn (Abbot) on Nov 29, 2010 at 23:19 UTC
    my @rrds = grep { /.*rrd$/ && -f "$logsdir/$_" } readdir DIR;

    Why match zero or more characters in  /.*rrd$/ when  /rrd$/ would match the same thing with less work?    Perhaps you meant  /\.rrd$/?



    $cmd = qq|$mv resize.rrd $logsdir/$rrd|; print "\tRenaming resized file, executing $cmd\n"; system($cmd) == 0 or die "Could not execute $cmd:$!\n";

    Why not just use Perl's built-in rename function?

    print "\tRenaming resized file, executing $cmd\n"; rename 'resize.rrd', "$logsdir/$rrd" or die "Could not rename +resize.rrd:$!\n";
      Thanks for the feedback, both valid points.

      heh... I completely forgot about the rename function ;-)

        Also,  /.*rrd$/ will match both of the strings  "rrd" and  "rrd\n" so perhaps you should use  /.*rrd\z/ instead.

        Or perhaps even:  'rrd' eq substr( $_, -3 )

Re: Resizing MRTG (RRDTool) logs en-masse
by droid385902 (Initiate) on Jan 27, 2012 at 21:07 UTC

    If you're interested, I've tweaked the code so that it moves all of the options to the command-line, and adds a couple of extra features:
    All DS values are examined (it's not hard-coded to 8)
    rows can be re-mapped on a per-DS basis
    rows can be re-mapped on a per-PDP basis
    the rrd files to be edited are now specified on the command-line, instead of on a per-directory basis (works better with tools like find/xargs)

    attached is a "diff -u" against the current code

    --- rrdtoolresize.pl.orig 2012-01-27 13:22:08.448124100 -0700 +++ rrdtoolresize.pl 2012-01-27 14:06:09.298551700 -0700 @@ -1,66 +1,175 @@ #!/usr/bin/perl +# +my $USAGE = "# +# Usage: $0 [ -v verbosity ] [ -f ] \ + [ -R rranum:rows[;rranum:rows]* | -P pdp:rows[;pdp:rows]* ] RRDs(s) +# Where: +# -v verbosity Specify the verbosity level (default = 10) +# -f Fake (dry) run (assumed if -R not specified) +# -R X:Y[;X:Y]* Resize rra X to have Y rows +# -M X:Y[;X:Y]* Remap every RRA with X pdps to Y rows +# +# File(s) +# These are RRD files that need to be 're-shaped' +# +# If neither -M nor -P is specified, then info about the RRD +# will be printed +# If both -M and -R are specified, then -R takes precedence +# +"; +# + use strict; use warnings; + use RRDs; +use Getopt::Std; +use Data::Dumper; use Time::HiRes qw/time/; -my $rrdtool = '/usr/bin/rrdtool'; -my $logsdir = 'logs'; # Where all the rrd files live +my $rrdtool = $ENV{'RRDTOOL'} || 'rrdtool'; + +my %opt; +getopts('fP:R:v:', \%opt) || die $USAGE; + +my $verbosity = $opt{'v'} || 10; +my $rrastr = $opt{'R'}; +my $pdpstr = $opt{'P'}; +my $dryrun = (defined $opt{'f'}) || + ((!defined $rrastr) && (!defined $pdpstr)) ? 1 : 0; +my $dumpinfo = ($verbosity >= 20) || + (!defined $pdpstr && !defined $rrastr) ? 1 : 0; + +my %rramap; +if (defined $rrastr) { + my @rows = split(/\s*;\s*/, $rrastr); + foreach my $redo (@rows) { + my @info = split(/\s*:\s*/, $redo); + if ($#info != 1) { + die "Bad rra resize specification ($redo) in -R $rrastr! Died"; + } + elsif ($info[1] < 1) { + die "Invalid rra row count ($info[1]) in -R $rrastr! Died"; + } + elsif ($info[0] !~ /^\d+$/) { + die "Invalid rra number ($info[0]) in -R $rrastr! Died"; + } + else { + $rramap{$info[0]} = int($info[1]); + } + } +} + +my %pdpmap; +if (defined $pdpstr) { + my @rows = split(/\s*;\s*/, $pdpstr); + foreach my $redo (@rows) { + my @info = split(/\s*:\s*/, $redo); + if ($#info != 1) { + die "Bad rra resize specification ($redo) in -P $pdpstr! Died"; + } + elsif ($info[1] < 1) { + die "Invalid rra pdp count ($info[1]) in -P $pdpstr! Died"; + } + elsif ($info[0] !~ /^\d+$/) { + die "Invalid rra number ($info[0]) in -P $pdpstr! Died"; + } + else { + $pdpmap{$info[0]} = int($info[1]); + } + } +} -# The number of data sources in each rrd file -# Typically, for mtrg-generated rrds this will be 8 -my $datasources = 8; - -my %wanted = ( - 1 => 8640, # 30 days of 5 minute data - 6 => 17520, # 365 days of 30 min data - 24 => 13140, # 3 years of 2 hour data - 288 => 3650, # 10 years of 1 day data - ); - -opendir(DIR, $logsdir) or die "Cannot open $logsdir:$!\n"; -my @rrds = grep { /.rrd$/ && -f "$logsdir/$_" } readdir DIR; -closedir DIR; -my $numfiles = scalar @rrds; -print "Starting, found $numfiles rrd files\n\n"; + +my @rrds = @ARGV; my $start = time; +my $numfiles = 0; for my $rrd (sort @rrds) { print "\nProcessing $rrd\n"; - my $info = RRDs::info "$logsdir/$rrd"; + my $info = RRDs::info $rrd; # Check to ensure we actually have a valid rrd file - unless ($info->{filename}) { - print qq|"$logsdir/$rrd" doesn't appear to be a valid rrd log +, skipping\n|; + if ($info->{filename}) { + printf "DEBUG: RRD %s info: %s\n", + $rrd, join("\n", sort split(/\n/, Dumper($info))) + if ($dumpinfo); + } + else { + print "$rrd isn't a valid rrd log, skipping\n"; next; } - for (0 .. $datasources -1) { - my $cmd = qq|$rrdtool resize $logsdir/$rrd |; - my $pdp = $info->{"rra[$_].pdp_per_row"}; - my $rows = $info->{"rra[$_].rows"}; - my $cf = $info->{"rra[$_].cf"}; - my $diff = $rows - $wanted{$pdp}; - printf("\tCurrent DS => PDP per row:%.f Rows:%.f CF:%s\n", $p +dp, $rows, $cf); + + $numfiles++; + my @rras = sort map { substr($_, 4, index($_, ']', 4)-4) } + grep { /rra\[\d+\].pdp_per_row/ } keys %{$info}; + + ## Debug: + # printf "Found:\n %s\n", join("\n ", + # grep { /rra\[\d+\].pdp_per_row/ } keys %{$info}); + # printf "RRAs: %s\n", join(" ", @rras); + + foreach my $rra (sort { $a <=> $b } @rras) { + my $cmd = qq|$rrdtool resize $rrd |; + my $rows = $info->{"rra[$rra].rows"}; + my $cf = $info->{"rra[$rra].cf"}; + my $pdp = $info->{"rra[$rra].pdp_per_row"}; + printf "\tDS %s => PDP per row:%.f Rows:%.f CF:%s\n", + $rra, $pdp, $rows, $cf; + my $wanted = (defined $rramap{$rra}) ? $rramap{$rra} : + (defined $pdpmap{$pdp}) ? $pdpmap{$pdp} : -1; + if ($wanted <= 0) + { + printf "DEBUG: Skipping RRA %s (no map found)\n", $rra + if ($verbosity >= 15); + next; + } + + my $diff = $rows - $wanted; if ($diff < 0) { $diff = abs($diff); - $cmd .= qq|$_ GROW $diff|; + $cmd .= qq|$rra GROW $diff|; } elsif ($diff > 0) { - $cmd .= qq|$_ SHRINK $diff|; + $cmd .= qq|$rra SHRINK $diff|; } else { - print "\tNo change to this DS\n\n"; + print "\tNo change to DS $rra\n\n"; next; } - print "\tResizing to $wanted{$pdp} rows, executing $cmd\n"; - system($cmd) == 0 or die "Could not execute $cmd:$!\n"; - print "\tRenaming resized file\n"; - rename 'resize.rrd', "$logsdir/$rrd"; - print "\tDone.\n"; + print "\tResizing to $wanted rows, executing $cmd\n"; + if (!$dryrun) { + system($cmd) == 0 or die "\tCould not execute $cmd: $!"; + print "\tRenaming resized file\n"; + + # We jump through a number of hoops because the RRD may not + # be in the current directory (but the created "resize.rrd" + # IS in the current directory!) + unlink $rrd.'.bk'; + + # Do this in case one of the steps below fails + rename $rrd, $rrd.'.bk' || + die "\tUnable to move the old $rrd way! Stopping"; + + if (!link('resize.rrd', $rrd)) { + print "\tNOTICE: link(resize.rrd, $rrd) failed.". + " Trying 'mv' instead!\n"; + if (system("mv resize.rrd $rrd")) { + # Try to put the original RRD back + rename $rrd.'.bk', $rrd; + die "\tFailed to link/move resize.rrd to $rrd! Died"; + } + } + else { + unlink 'resize.rrd'; + unlink $rrd.'.bk'; + } + print "\tDone.\n"; + } } } my $end = time; -my $dur = sprintf("%.2f", $end - $start); +my $dur = sprintf('%.2f', $end - $start); print "Finished, processed $numfiles files in $dur seconds\n\n";
      Hey, thanks for that :-)

      I've applied your patch and thrown this on GitHub

      Cheers,
      Darren

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://874274]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2024-04-16 11:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found