http://www.perlmonks.org?node_id=300372
Category: Utility Scripts
Author/Contact Info Bart Vetters | robartes
Description: This module implements a way of taking directory structure snapshots using the rsync/hardlink method from Hack #74 in Linux Server Hacks. It's fairly basic for the moment, and limited to Unix platforms. Future versions will become more universal through the use of link and File::Find.
#!/usr/local/bin/perl
#
# Snapshot.pm
#
# Module containing methods relating to the backup and restore of stuf
+f
# using the LSH style snapshots

=pod

=head1 Snapshot.pm

=head1 NAME

Snapshot.pm

Module to take snapshots using Linux Server Hacks rsync / cp -al metho
+d

=head1 SYNOPSIS

  use Snapshot qw/do_snapshot/;

  my $return=do_snapshot({ source => '/home/robartes/',
                           snapshot_dir => '/snapshots/home',
                         }) or die $Snapshot::error;

=head1 DESCRIPTION

=head2 Introduction

Snapshot.pm provides a method to manage snapshots of directory trees. 
+It uses
the method given in Hack #74 in the excellent O'Reilly "Linux Server H
+acks"
book (don't be put off by the Linux part - most hacks are applicable t
+o any
Unix, and so is this module). More information about this method can b
+e
found in that book and in this document under L</SNAPSHOT METHOD>.

'Snapshots' are defined as a number of copies of a directory tree,
reflecting the state of said tree I<at the moment the copy was taken>.
+ This
means that a file in a snapshot of five days old will be the version o
+f that
file that existed five days ago, even if the file was modified or even
deleted since then.

=head2 Series and MRS

Snapshot.pm keeps a configurable number of snapshots in so called I<se
+ries>.
A series is defined as a number of snapshots using a common naming sch
+eme
and following each other chronologically. One could for example imagin
+e a
series to consist of daily snapshots. Multiple series can be defined a
+nd
handled by Snapshot.pm.

The most recent snapshot (MRS) is treated specially. It is the same fo
+r all
series and is named seperately from the series. It is also not include
+d in
the count of series snapshots to keep.


                  +------+    +------+    +------+
               +--| S1_1 | -- | S1_2 | -- | S1_3 |
              /   +------+    +------+    +------+
  +-----+    /
  | MRS | --+
  +-----+    \
              \   +------+    +------+    +------+    +------+
               +--| S2_1 | -- | S2_2 | -- | S2_3 | -- | S2_4 |
                  +------+    +------+    +------+    +------+

The figure represents a snapshot scheme consisting of two series - S1 
+and S2
(the names are configurable). S1 is defined to keep 3 snapshots, S2 ke
+eps 4.
When the do_snapshot() routine is called with these parameters, it wil
+l
first rotate the series snapshots:

  S1_3  ==>  Deleted
  S1_2  ==>  S1_3
  S1_1  ==>  S1_2
  MRS   ==>  S1_1
  
... and analogously for S2. After this, the MRS will be updated to ref
+lect
the current state of the directory tree from which the snapshot is tak
+en
(using rsync).

The rotation of MRS to first series member is done by hardlinking file
+s.
This, combined with the fact that rsync creates new versions of modifi
+ed
files instead of modifying them in place, is where all the magic of th
+is
snapshot method lies.

After the rotations and the snapshot are done, a timetag file is writt
+en in
the MRS. This timetag file has a timestamp of just before the rsync st
+ep was
started, and contains that time formatted as by C<ctime(3)> (or, more
precisely, as the output from C<localtime> in scalar context). This is
+ done
to ensure that one has a reference point as to the age of the snapshot
+: the
versions of the files in that snapshot are guaranteed to be those of t
+he
time recorded in the timetag (or later, to be precise). The timetags a
+re not
changed by the series rotation process.

=head2 do_snapshot()

The core routine of Snapshot.pm, and the only one exported is
C<do_snapshot()>. This has to be imported explicitely:

  use Snapshot qw/do_snapshot/;

C<do_snapshot()> is called with a hash or reference to a hash as argum
+ent.
This hash lists configuration keys and values. Available keys and valu
+es
are listed below. Mandatory options are marked with *.

=over 2

=item source *

The directory tree to take a snapshot from. Trailing slashes
are optional.


=item snapshot_dir *

The directory where the snapshots are located. MRS and
series snapshots are located in this directory.

=item series      

A reference to an array containing hashrefs for series
configuration. Should contain one hashref per series, with mandatory k
+eys
'stub' and 'keep'. stub is the name of the series (number will be appe
+nded),
keep is how many snapshots should be kept in the series, excluding MRS
+.

Default: C<[ { 'stub' => 'daily_', 'keep' => 6 } ]>

=item verbose

When set to a true value, C<Snapshot.pm> becomes talkative. Currently 
+this
only means that you get the output of the rsync step on STDOUT.

Default: false

=item MRS     

The name of the Most Recent Snapshot.

Default: current

=item exclude

A reference to an array listing file patterns to exclude or include. T
+his is
passed on to the rsync stage, and goes straight through to L<File::Rsy
+nc>.

Default: empty

=item rsync_args

A hash reference of extra arguments to L<File::Rsync>. Any valid optio
+n to
File::Rsync will work, and will override options set through other mea
+ns,
notably C<verbose> and C<delete>. For the sake of the integrity of you
+r
snapshots, you are advised to know what you are doing when you fiddle 
+with
these.

Default: empty

=item fatal_errors

When set to true, any error encountered will result in an exception be
+ing
thrown. You can catch this with C<eval> and examine C<$@> in the usual
+ way.

Default: false

=item timetag

The filename of the timetag file.

Default: .snaptime

=back

=head1 RETURN VALUE

C<Snapshot.pm> returns the output from the rsync step in a scalar if a
+ll
goes well.

In case of an error, false is returned and $Snapshot::error will conta
+in a
more or less informative message loosely related to the cause of the
error.

Unless asked to do so, C<Snapshot.pm> should never throw exceptions.

=head1 PORTABILITY

This version is restricted to platforms that have cp -al. That probabl
+y
means Unix. Later versions will cater for additional platforms, namely
+ those
that support the link function and the L</File::Find> module.

=head1 DEPENDENCIES

Snapshot.pm uses these non standard modules:

  File::Rsync

=head1 SNAPSHOT METHOD

C<Snapshot.pm> uses a method from Hack #74 in Linux Server Hacks. The 
+method
is based on hard links and the fact that rsync creates new versions of
+ files
it modifies, as opposed to modifying them in place.

The magic is in the copy step from MRS to the first series member. Thi
+s step
is done using C<cp -al>, which creates hardlinks (C<-l>) of the files
instead of straight copies. If a file F is later changed in the MRS, t
+he
hardlink still points to the I<original> version of it: rsync does an 
+unlink
on it, which decreases the reference count of the inode. The inode doe
+s not
wink out of existence, as there still is a reference to it created by 
+hard
linking to it during the copy step.

Rotating the series elements is done with a simple C<move> (there is n
+o need
to take more hard links).

The beauty of this method is that the copy step from MRS to S1 is a tr
+ue
snapshot: only directories are actually created in S1, the files are s
+imply
hard links. When files in the MRS are subsequently changed, only the c
+hanged
files will take up extra space on the disk. So for each snapshot, you 
+only
need extra space for the files that have changed compared with the MRS
+.

=head1 SEE ALSO

L<File::Rsync> -- Documentation for the File::Rsync module
L<http://www.oreilly.com/catalog/linuxsvrhack/> -- Linux Server Hacks 
+book

=head1 AUTHOR

Bart Vetters, L<robartes@nirya.be>

=head1 Copyrights

Copyright (c) 2003 Bart Vetters.  All rights reserved.

This program is free software; you can redistribute it and/or modify i
+t under the same terms as Perl itself.


=cut

use strict;
use warnings;

package Snapshot;
require Exporter;
our @ISA = qw/Exporter/;
use File::Rsync;
use File::Copy;
use File::Path;

our @EXPORT_OK=qw/do_snapshot/;
our $VERSION="0.02";

my $globals;
our $error;
my %rsync_out;

sub do_snapshot {

  my $config = parse_args(@_) or return 0;

  rotate_series($config->{'snapshot_dir'},
                $config->{'MRS'},
                $config->{'series'},
                $config->{'num_series'},
               ) or return 0;
  my $now=localtime();
  my $retval=do_sync($config->{'source'},
                     $config->{'snapshot_dir'},
                     $config->{'MRS'},
                     $config->{'exclude'},
                     $config->{'rsync_args'},
                    ) or return 0;
  chdir $config->{'snapshot_dir'}.$config->{'MRS'} or return _croak("T
+his is weird: MRS is not accessible after snapshot: $!\n");
  unlink $config->{'timetag'} if (-f $config->{'timetag'});
  if (open TIMETAG, ">".$config->{'timetag'} ) {
    print TIMETAG $now."\n" or _warn("Could not write snapshot time to
+ timetag file. Snapshot has been taken successfully though.\n");
    close TIMETAG;
  } else {
    _warn("Could not write snapshot time to timetag file. Snapshot has
+ been taken successfully though.\n");
  }
  return $retval;
}

sub parse_args {

  my %allowed_args = ( 'source' => undef,
                       'snapshot_dir' => undef,
                       'exclude' => undef,
                       'MRS' => 'current',
                       'series' => [ { 'stub' => 'daily_',
                                       'keep' => 6,
                                   } ],
                       'rsync_args' => undef,
                       'fatal_errors' => undef,
                       'verbose' => undef,
                       'timetag' => '.snaptime'
                     );
  my @mandatory_args = qw /source snapshot_dir/;
  my @global_flags=qw/fatal_errors verbose/;

  my @args=@_;
  my %config;

  eval {
    if ( ref($args[0]) ) { 
      my $hashref=$args[0];
      %config = %$hashref;
    } else {
      %config=@args;
    }
  };

  if ($@) {return _croak("Please give a hashref or hash as arguments.\
+n") };

  foreach my $key (keys(%config)) {
    return _croak("Unknown argument: $key.\n") unless exists $allowed_
+args{$key};
    $allowed_args{$key}=$config{$key};
  }

  foreach (@mandatory_args)  {
    return _croak("Missing arg: $_\n") unless defined ( $allowed_args{
+$_} );
  }

  foreach (@global_flags) {
    $globals->{$_}=$allowed_args{$_};
  }

  $allowed_args{'source'} =~ s|([^/])$|$1/|;
  $allowed_args{'snapshot_dir'} =~ s|([^/])$|$1/|;

  return \%allowed_args;

}

sub rotate_series {

  my ($directory, $current, $series, $num_series) = @_;

  chdir $directory or return _croak("Problem accessing snapshot direct
+ory: $!\n");
  
  if (defined ($num_series)) {
    return _croak("Invalid number of series: $num_series\n") if ( $num
+_series < 1 );
    return _croak("Not enough series data.\n") if ( scalar @$series < 
+$num_series );
  } else {
    $num_series = @$series;
  }

  for (0 .. ($num_series - 1)) {
    my $stub=$series->[$_]{'stub'};
    my $keep=$series->[$_]{'keep'};
    rotate_files($stub,$keep,$current) or return 0;
  }

  return 1;
  
}

sub rotate_files {

  my ($stub, $max, $current) = @_;

  return _croak("Invalid number of snapshots to retain: $max\n") if ($
+max < 1);
  _warn("Only 1 version to keep. No rotating of files will be done.\n"
+) if $max == 1;

  my $maxdir="$stub$max";
  if ( -d $maxdir ) {
    rmtree($maxdir,0,1) or return _croak("Could not delete $stub$max\n
+");
  };
  if ($max > 2 ) {
    my $x=($max - 1);
    while  ($x) {
      if ( -d $stub.$x ) {
        move( $stub.$x, $stub.( $x + 1 ) ) or return _croak("A move we
+nt wrong: $!\nI'm aborting to avoid possible data corruption.\nThe mo
+ve that failed is $stub$x to $stub".( $x + 1 )."\n");
      }
      $x--;
    }
  }

  my $target=$stub."1";
  if ( -d $current) {
    return _croak("Problem with copy step: ".($? >> 8) ."\n") if syste
+m("cp -al $current/ $target");
  } else {
    _warn("No MRS found. Will be created in rsync step.\n");
  }
  return 1; 
}

sub do_sync {

  my ($source, $snapshot_dir, $current, $exclude, $rsync_args) = @_;
  my $dest=${snapshot_dir}.$current;
  my %extra_args;

  if (defined($rsync_args)) {
    eval { %extra_args = %$rsync_args };
    return _croak("Problem with extra rsync args: $@\n") if ($@);
  }

  unless (defined($extra_args{'exclude'})) {
    $extra_args{'exclude'}=$exclude if defined($exclude);
  }

  my $rsync = File::Rsync -> new(archive => 1,
                                 update => 1,
                                 compress => 1,
                                 verbose => 1,
                                 delete => 1,
                                 outfun => \&rsync_output,
                                 errfun => \&rsync_output,
                                 %extra_args,
                                );
  $rsync->exec ( { 'src' => $source, 'dest' => $dest } ) or return _cr
+oak("Rsync failed: ".$rsync_out{'err'}."\n");
  return $rsync_out{'out'};

}

sub rsync_output {

  my ($message,$type)=@_;

  $rsync_out{$type}.="$message\n";
  print $message if ( defined($globals->{'verbose'}) && $globals->{'ve
+rbose'});

}

sub _warn {

  my $message=shift;
  print STDERR $message;

}

sub _croak {

  $error=shift;
  die $error if $globals->{'fatal_errors'};
  return 0;

}

# I am a good little module
1;