Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Snapshot.pm

by robartes (Priest)
on Oct 19, 2003 at 11:53 UTC ( #300372=sourcecode: print w/ replies, xml ) Need Help??

Category: Utility Scripts
Author/Contact Info Bart Vetters | robartes
Description: This module implements a way of taking directory structure snapshots using the rsync/hardlink method from Hack #74 in Linux Server Hacks. It's fairly basic for the moment, and limited to Unix platforms. Future versions will become more universal through the use of link and File::Find.
#!/usr/local/bin/perl
#
# Snapshot.pm
#
# Module containing methods relating to the backup and restore of stuf
+f
# using the LSH style snapshots

=pod

=head1 Snapshot.pm

=head1 NAME

Snapshot.pm

Module to take snapshots using Linux Server Hacks rsync / cp -al metho
+d

=head1 SYNOPSIS

  use Snapshot qw/do_snapshot/;

  my $return=do_snapshot({ source => '/home/robartes/',
                           snapshot_dir => '/snapshots/home',
                         }) or die $Snapshot::error;

=head1 DESCRIPTION

=head2 Introduction

Snapshot.pm provides a method to manage snapshots of directory trees. 
+It uses
the method given in Hack #74 in the excellent O'Reilly "Linux Server H
+acks"
book (don't be put off by the Linux part - most hacks are applicable t
+o any
Unix, and so is this module). More information about this method can b
+e
found in that book and in this document under L</SNAPSHOT METHOD>.

'Snapshots' are defined as a number of copies of a directory tree,
reflecting the state of said tree I<at the moment the copy was taken>.
+ This
means that a file in a snapshot of five days old will be the version o
+f that
file that existed five days ago, even if the file was modified or even
deleted since then.

=head2 Series and MRS

Snapshot.pm keeps a configurable number of snapshots in so called I<se
+ries>.
A series is defined as a number of snapshots using a common naming sch
+eme
and following each other chronologically. One could for example imagin
+e a
series to consist of daily snapshots. Multiple series can be defined a
+nd
handled by Snapshot.pm.

The most recent snapshot (MRS) is treated specially. It is the same fo
+r all
series and is named seperately from the series. It is also not include
+d in
the count of series snapshots to keep.


                  +------+    +------+    +------+
               +--| S1_1 | -- | S1_2 | -- | S1_3 |
              /   +------+    +------+    +------+
  +-----+    /
  | MRS | --+
  +-----+    \
              \   +------+    +------+    +------+    +------+
               +--| S2_1 | -- | S2_2 | -- | S2_3 | -- | S2_4 |
                  +------+    +------+    +------+    +------+

The figure represents a snapshot scheme consisting of two series - S1 
+and S2
(the names are configurable). S1 is defined to keep 3 snapshots, S2 ke
+eps 4.
When the do_snapshot() routine is called with these parameters, it wil
+l
first rotate the series snapshots:

  S1_3  ==>  Deleted
  S1_2  ==>  S1_3
  S1_1  ==>  S1_2
  MRS   ==>  S1_1
  
... and analogously for S2. After this, the MRS will be updated to ref
+lect
the current state of the directory tree from which the snapshot is tak
+en
(using rsync).

The rotation of MRS to first series member is done by hardlinking file
+s.
This, combined with the fact that rsync creates new versions of modifi
+ed
files instead of modifying them in place, is where all the magic of th
+is
snapshot method lies.

After the rotations and the snapshot are done, a timetag file is writt
+en in
the MRS. This timetag file has a timestamp of just before the rsync st
+ep was
started, and contains that time formatted as by C<ctime(3)> (or, more
precisely, as the output from C<localtime> in scalar context). This is
+ done
to ensure that one has a reference point as to the age of the snapshot
+: the
versions of the files in that snapshot are guaranteed to be those of t
+he
time recorded in the timetag (or later, to be precise). The timetags a
+re not
changed by the series rotation process.

=head2 do_snapshot()

The core routine of Snapshot.pm, and the only one exported is
C<do_snapshot()>. This has to be imported explicitely:

  use Snapshot qw/do_snapshot/;

C<do_snapshot()> is called with a hash or reference to a hash as argum
+ent.
This hash lists configuration keys and values. Available keys and valu
+es
are listed below. Mandatory options are marked with *.

=over 2

=item source *

The directory tree to take a snapshot from. Trailing slashes
are optional.


=item snapshot_dir *

The directory where the snapshots are located. MRS and
series snapshots are located in this directory.

=item series      

A reference to an array containing hashrefs for series
configuration. Should contain one hashref per series, with mandatory k
+eys
'stub' and 'keep'. stub is the name of the series (number will be appe
+nded),
keep is how many snapshots should be kept in the series, excluding MRS
+.

Default: C<[ { 'stub' => 'daily_', 'keep' => 6 } ]>

=item verbose

When set to a true value, C<Snapshot.pm> becomes talkative. Currently 
+this
only means that you get the output of the rsync step on STDOUT.

Default: false

=item MRS     

The name of the Most Recent Snapshot.

Default: current

=item exclude

A reference to an array listing file patterns to exclude or include. T
+his is
passed on to the rsync stage, and goes straight through to L<File::Rsy
+nc>.

Default: empty

=item rsync_args

A hash reference of extra arguments to L<File::Rsync>. Any valid optio
+n to
File::Rsync will work, and will override options set through other mea
+ns,
notably C<verbose> and C<delete>. For the sake of the integrity of you
+r
snapshots, you are advised to know what you are doing when you fiddle 
+with
these.

Default: empty

=item fatal_errors

When set to true, any error encountered will result in an exception be
+ing
thrown. You can catch this with C<eval> and examine C<$@> in the usual
+ way.

Default: false

=item timetag

The filename of the timetag file.

Default: .snaptime

=back

=head1 RETURN VALUE

C<Snapshot.pm> returns the output from the rsync step in a scalar if a
+ll
goes well.

In case of an error, false is returned and $Snapshot::error will conta
+in a
more or less informative message loosely related to the cause of the
error.

Unless asked to do so, C<Snapshot.pm> should never throw exceptions.

=head1 PORTABILITY

This version is restricted to platforms that have cp -al. That probabl
+y
means Unix. Later versions will cater for additional platforms, namely
+ those
that support the link function and the L</File::Find> module.

=head1 DEPENDENCIES

Snapshot.pm uses these non standard modules:

  File::Rsync

=head1 SNAPSHOT METHOD

C<Snapshot.pm> uses a method from Hack #74 in Linux Server Hacks. The 
+method
is based on hard links and the fact that rsync creates new versions of
+ files
it modifies, as opposed to modifying them in place.

The magic is in the copy step from MRS to the first series member. Thi
+s step
is done using C<cp -al>, which creates hardlinks (C<-l>) of the files
instead of straight copies. If a file F is later changed in the MRS, t
+he
hardlink still points to the I<original> version of it: rsync does an 
+unlink
on it, which decreases the reference count of the inode. The inode doe
+s not
wink out of existence, as there still is a reference to it created by 
+hard
linking to it during the copy step.

Rotating the series elements is done with a simple C<move> (there is n
+o need
to take more hard links).

The beauty of this method is that the copy step from MRS to S1 is a tr
+ue
snapshot: only directories are actually created in S1, the files are s
+imply
hard links. When files in the MRS are subsequently changed, only the c
+hanged
files will take up extra space on the disk. So for each snapshot, you 
+only
need extra space for the files that have changed compared with the MRS
+.

=head1 SEE ALSO

L<File::Rsync> -- Documentation for the File::Rsync module
L<http://www.oreilly.com/catalog/linuxsvrhack/> -- Linux Server Hacks 
+book

=head1 AUTHOR

Bart Vetters, L<robartes@nirya.be>

=head1 Copyrights

Copyright (c) 2003 Bart Vetters.  All rights reserved.

This program is free software; you can redistribute it and/or modify i
+t under the same terms as Perl itself.


=cut

use strict;
use warnings;

package Snapshot;
require Exporter;
our @ISA = qw/Exporter/;
use File::Rsync;
use File::Copy;
use File::Path;

our @EXPORT_OK=qw/do_snapshot/;
our $VERSION="0.02";

my $globals;
our $error;
my %rsync_out;

sub do_snapshot {

  my $config = parse_args(@_) or return 0;

  rotate_series($config->{'snapshot_dir'},
                $config->{'MRS'},
                $config->{'series'},
                $config->{'num_series'},
               ) or return 0;
  my $now=localtime();
  my $retval=do_sync($config->{'source'},
                     $config->{'snapshot_dir'},
                     $config->{'MRS'},
                     $config->{'exclude'},
                     $config->{'rsync_args'},
                    ) or return 0;
  chdir $config->{'snapshot_dir'}.$config->{'MRS'} or return _croak("T
+his is weird: MRS is not accessible after snapshot: $!\n");
  unlink $config->{'timetag'} if (-f $config->{'timetag'});
  if (open TIMETAG, ">".$config->{'timetag'} ) {
    print TIMETAG $now."\n" or _warn("Could not write snapshot time to
+ timetag file. Snapshot has been taken successfully though.\n");
    close TIMETAG;
  } else {
    _warn("Could not write snapshot time to timetag file. Snapshot has
+ been taken successfully though.\n");
  }
  return $retval;
}

sub parse_args {

  my %allowed_args = ( 'source' => undef,
                       'snapshot_dir' => undef,
                       'exclude' => undef,
                       'MRS' => 'current',
                       'series' => [ { 'stub' => 'daily_',
                                       'keep' => 6,
                                   } ],
                       'rsync_args' => undef,
                       'fatal_errors' => undef,
                       'verbose' => undef,
                       'timetag' => '.snaptime'
                     );
  my @mandatory_args = qw /source snapshot_dir/;
  my @global_flags=qw/fatal_errors verbose/;

  my @args=@_;
  my %config;

  eval {
    if ( ref($args[0]) ) { 
      my $hashref=$args[0];
      %config = %$hashref;
    } else {
      %config=@args;
    }
  };

  if ($@) {return _croak("Please give a hashref or hash as arguments.\
+n") };

  foreach my $key (keys(%config)) {
    return _croak("Unknown argument: $key.\n") unless exists $allowed_
+args{$key};
    $allowed_args{$key}=$config{$key};
  }

  foreach (@mandatory_args)  {
    return _croak("Missing arg: $_\n") unless defined ( $allowed_args{
+$_} );
  }

  foreach (@global_flags) {
    $globals->{$_}=$allowed_args{$_};
  }

  $allowed_args{'source'} =~ s|([^/])$|$1/|;
  $allowed_args{'snapshot_dir'} =~ s|([^/])$|$1/|;

  return \%allowed_args;

}

sub rotate_series {

  my ($directory, $current, $series, $num_series) = @_;

  chdir $directory or return _croak("Problem accessing snapshot direct
+ory: $!\n");
  
  if (defined ($num_series)) {
    return _croak("Invalid number of series: $num_series\n") if ( $num
+_series < 1 );
    return _croak("Not enough series data.\n") if ( scalar @$series < 
+$num_series );
  } else {
    $num_series = @$series;
  }

  for (0 .. ($num_series - 1)) {
    my $stub=$series->[$_]{'stub'};
    my $keep=$series->[$_]{'keep'};
    rotate_files($stub,$keep,$current) or return 0;
  }

  return 1;
  
}

sub rotate_files {

  my ($stub, $max, $current) = @_;

  return _croak("Invalid number of snapshots to retain: $max\n") if ($
+max < 1);
  _warn("Only 1 version to keep. No rotating of files will be done.\n"
+) if $max == 1;

  my $maxdir="$stub$max";
  if ( -d $maxdir ) {
    rmtree($maxdir,0,1) or return _croak("Could not delete $stub$max\n
+");
  };
  if ($max > 2 ) {
    my $x=($max - 1);
    while  ($x) {
      if ( -d $stub.$x ) {
        move( $stub.$x, $stub.( $x + 1 ) ) or return _croak("A move we
+nt wrong: $!\nI'm aborting to avoid possible data corruption.\nThe mo
+ve that failed is $stub$x to $stub".( $x + 1 )."\n");
      }
      $x--;
    }
  }

  my $target=$stub."1";
  if ( -d $current) {
    return _croak("Problem with copy step: ".($? >> 8) ."\n") if syste
+m("cp -al $current/ $target");
  } else {
    _warn("No MRS found. Will be created in rsync step.\n");
  }
  return 1; 
}

sub do_sync {

  my ($source, $snapshot_dir, $current, $exclude, $rsync_args) = @_;
  my $dest=${snapshot_dir}.$current;
  my %extra_args;

  if (defined($rsync_args)) {
    eval { %extra_args = %$rsync_args };
    return _croak("Problem with extra rsync args: $@\n") if ($@);
  }

  unless (defined($extra_args{'exclude'})) {
    $extra_args{'exclude'}=$exclude if defined($exclude);
  }

  my $rsync = File::Rsync -> new(archive => 1,
                                 update => 1,
                                 compress => 1,
                                 verbose => 1,
                                 delete => 1,
                                 outfun => \&rsync_output,
                                 errfun => \&rsync_output,
                                 %extra_args,
                                );
  $rsync->exec ( { 'src' => $source, 'dest' => $dest } ) or return _cr
+oak("Rsync failed: ".$rsync_out{'err'}."\n");
  return $rsync_out{'out'};

}

sub rsync_output {

  my ($message,$type)=@_;

  $rsync_out{$type}.="$message\n";
  print $message if ( defined($globals->{'verbose'}) && $globals->{'ve
+rbose'});

}

sub _warn {

  my $message=shift;
  print STDERR $message;

}

sub _croak {

  $error=shift;
  die $error if $globals->{'fatal_errors'};
  return 0;

}

# I am a good little module
1;

Comment on Snapshot.pm
Download Code

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://300372]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (14)
As of 2014-07-11 14:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (227 votes), past polls