Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

bkpprog - prepare full backups in reasonably sized tarballs

by ambrus (Abbot)
on Aug 24, 2011 at 07:35 UTC ( [id://922051]=CUFP: print w/replies, xml ) Need Help??

NAME

bkpprog - prepare full backups in reasonably sized tarballs

SYNOPSIS

cd /home/bkp/bkpcur cat > bkprule <<ENDRULES cmd tar -cz --null --no-recursion -T - -f /home/bkp/bkpcur/$v.tgz cd / vz 256m i . roo i ./usr/local loc e ./home/bkp ENDRULES perl bkpprog.pl bkprule &gt; bkpplan # scans files perl bkpprog.pl -e bkpplan # creates tarballs # then write the tarballs from /home/bkp/bkpcur to the backup media

DESCRIPTION

This program helps you make full backups broken to smaller sized tarballs. It is recommended for home computers only, not for production systems. In particular, it does not give you a way to retrieve the backups.

In the first pass, the program will recursively scan through the part of the filesystem you mark to be backed up, and tries to segment it to chunks that will get archived of a preset size. For this, the program needs a configuration file usually called bkprule which tells which parts of the filesystem to include and exclude. It writes the chunks found to a plan file usually called bkpplan which tells what parts of the filesystem goes to which chunk. The actual tarballs are then created in the second pass. You have a chance to review the chunks and change the plan between the two passes.

You normally use this program to create the backup tarballs on a hard disk. You will then usually burn these tarballs to optical disks, but this program does not aim in that. You will have to group the tarballs to larger groups, each of which fit on a disk, and then use some external program to burn them to disks. This program also does not aid you in restoring your system from the backups.

The sizes of chunks are only approximately equal, so don't use this program if you want to write one file per disk. However, the grouping tries to keep directories in one chunk when practical, so each chunk will usually have a set of related files. This can aid in selectively retrieving files from the backup, but more importantly it lets you see in advance which directories will take the most space in the backup.

Though the default is to create all tarballs at once, there is a way to create only some of the tarballs at once, e.g. if you are low on disk space. In this case, you will run the first pass only once, but run the second pass multiple times, each time editing the plan file first (see later) to select which tarballs to actually produce.

In the first pass, you call the program with the name of the configuration file as its sole command-line argument. The plan file is written to standard output. In the second pass, you call the program with the switch -e and the name of the plan file.

WARNINGS

This program is just a tool. Running it alone does not ensure that you have backups. It is entirely your responsability that you do not lose data.

SOURCE CODE

including all documentation in POD format follows.
#!perl =head1 NAME bkpprog - prepare full backups in reasonably sized tarballs =head1 SYNOPSIS cd /home/bkp/bkpcur cat > bkprule <<ENDRULES cmd tar -cz --null --no-recursion -T - -f /home/bkp/bkpcur/$v.tg +z cd / vz 256m i . roo i ./usr/local loc e ./home/bkp ENDRULES perl bkpprog.pl bkprule > bkpplan # scans files perl bkpprog.pl -e bkpplan # creates tarballs # then write the tarballs from /home/bkp/bkpcur to the backup medi +a =head1 DESCRIPTION This program helps you make full backups broken to smaller sized tarba +lls. It is recommended for home computers only, not for production systems. In particular, it does not give you a way to retrieve the backups. In the first pass, the program will recursively scan through the part of the filesystem you mark to be backed up, and tries to segment it to chunks that will get archived of a preset size. For this, the program needs a configuration file usually called B<bkprule> which tells which parts of the filesystem to include and exclude. It writes the chunks found to a plan file usually called B<bkpplan> which tells what parts of the filesystem goes to which chunk. The actual tarballs are then created in the second pass. You have a chance to review the chunks an +d change the plan between the two passes. You normally use this program to create the backup tarballs on a hard +disk. You will then usually burn these tarballs to optical disks, but this p +rogram does not aim in that. You will have to group the tarballs to larger g +roups, each of which fit on a disk, and then use some external program to bur +n them to disks. This program also does not aid you in restoring your system + from the backups. The sizes of chunks are only approximately equal, so don't use this program if you want to write one file per disk. However, the grouping tries to keep directories in one chunk when practical, so each chunk will usually have a set of related files. This can aid in selectively retrieving files from the backup, but more importantly it lets you see in advance which directories will take the most space in the backup. Though the default is to create all tarballs at once, there is a way t +o create only some of the tarballs at once, e.g. if you are low on disk +space. In this case, you will run the first pass only once, but run the secon +d pass multiple times, each time editing the plan file first (see later) to s +elect which tarballs to actually produce. In the first pass, you call the program with the name of the configura +tion file as its sole command-line argument. The plan file is written to standa +rd output. In the second pass, you call the program with the switch B<-e +> and the name of the plan file. =head1 CONFIGURATION FILE The following commands can be used in the configuration file. Each command must be in a separate line. =over =item B<i> I<pathname> I<prefix> Gives the pathname of a file or directory to include in the backup, with all its contents recursively. You should have at least one of th +is command in the configuration file to do anything useful. The prefix should be an alphabetic string that will be used in the nam +e of the tarball (a number is appended to distinguish chunks). It is ok +ay to use the same prefix in multiple B<i> commands. The recursion does not descend to subdirectories that are mount points, nor to directorie +s that are specifically forbidden with the B<e> command, but descends to anywhere else. If we descend to a directory mentioned in another B<i> command, it is included only once, and chunks made from that subdirect +ory will use the prefix from the latter command. =item B<e> I<pathname> Excludes a pathname and anything under it recursively, except for subp +aths specifically included with another B<i> command. You should probably exclude the directory where the backup tarballs ar +e created, if they would normally be included. =item B<cmd> I<command> Gives the shell command to use in the second phase to create a tarball +. In this line, C<$v> is replaced by the name of the chunk (a prefix and a serial number), C<$b> is replaced by the expected uncompressed size of this chunk in bytes, and C<$d> is replaced by a literal dollar sign +. The pathname of each individual file is written to the standard input +of this command, separated with nul characters. It is the responsability of this command to decide on the pathname of the tarball (probably usi +ng the chunkname plus some extension and directory), and to actually writ +e to it. The command must not recurse to subdirectories. =item B<cd> I<pathname> Gives a directory to change to after reading the configuration file bu +t before doing anything else. All pathnames to be backed up are then relative +to this directory, and this is also the working directory when the extern +al command is invoked. =item B<vz> I<size> Gives the target size of volumes. The size is in bytes but suffixes l +ike B<M> can be used. =item B<#> I<anything> A comment. =item B<vtba>, B<vtbg>, B<vtbr> These commands are undocumented, and are used to fine-tune the algorit +hm to break to chunks. =back =head1 THE PLAN FILE The plan file is output by the first pass, is used by the second pass. It can be read and edited by hand. It starts with a copy of the conte +nts the configuration file, both for documentation, and because some comma +nds are reused. The commands are as follows. =over =item B<#> I<anything> =item B<cmd> I<command> =item B<cd> I<pathname> See in the L</CONFIGURATION FILE> section above. =item B<e> I<pathname> Exclude this pathname and files under it, except for subpaths specific +ally included by a B<p> command. =item B<v> I<chunkname-int> I<chunkname-ext> I<size> Create a chunk. The first two parameters give the name of the chunk a +nd are usually identical. If not, the first parameter tells the name use +d in B<p> statements, but the second is passed to the external command. The last argument is the expected uncompressed size of this chunk in bytes, which can be passed to the command for information reasons, but is otherwise for documentation only. =item B<p> I<chunkname> I<pathname> Include the pathname and everything under it to a given chunk. Files mentioned in B<e> commands or other B<p> commands are excluded though. =item B<ena> =item B<dis> Process only chunks between an B<ena> statement and a B<dis> statement +. The command to create a tarball is not ran for other chunks. This can be used to create the backup in multiple passes if you're low on disk +space. The B<v> statements for disabled chunks are still read to know that fi +les marked by them should not be included in other (enabled) chunks. =item B<endplan> Marks the end of this file. =back =head1 NOTES The plan file only contains enough directory and filenames that separate the chunks from each other, but the actual full list of files is rediscovered in the second pass, thus any files created between the first and the second pass will be get to the backup. Computing the sizes is approximate for the following reasons. =over =item * You will probably create compressed tarballs, but the program counts t +he uncompressed size. =item * The size of meta-data in the tarball is not counted precisely. =item * If a file has multiple names, the size is counted once for each (they might not actually be stored multiple times in the tarball, because if two names of the same file would get into the same tarball, tar will only store them once). =item * The algorithm actively favors making boundaries of chunks simpler, esp +ecially making chunks that consist of all files under a single directory but n +othing else, even if this means making smaller chunks. =item * Individual files are never broken to multiple chunks, so backing up a very large file will +imply having a very large tarball (though as you will have few of these file +s, you can just exclude them and back them up separately without this program +). =item * Files may get created or changed between the two passes. =back The program tries to exclude any mount points so only once file system + is backed up. To override this, use an B<i> command for each mount point +. =head1 EXAMPLES & HINTS See L<SYNOPSIS> for a full example of a (short) configuration file. =head2 What cmd to use Here are a few examples for the B<cmd> command. A minimal example that uses the C<tar> program is the following. cmd tar -cz --null --no-recursion -T - -f /home/bkp/bkpcur/$v.tgz Here, you should replace the pathname with whatever directory you want + the tarballs to be placed. A more complicated example follows, which also + prints an approximate progress percentage while creating each tarball cmd tar -cz --null --no-recursion -T - -f /home/bkp/bkp1107/all/$v +.tgz -b 20 --checkpoint=100 --checkpoint-action=exec='printf " %s %3 +d%%\e[K\r" $v $[(TAR_CHECKPOINT*TAR_BLOCKING_FACTOR*512*100+$b/2)/$b] + >&2' ; echo >&2 =head2 Su If you want to back up a whole system, you will typically run bkpprog +as root (in both phases), so it can read all directories and stat all files yo +u will back up. You can then run the cd-writing program as a user, provided +you make sure it can read the tarballs you have created. In principle, however, there is no need to run the program as super-us +er. If all the files and directories you want to back up are readable to a + user, you can run this program as that user. =head2 Bind mount / to read /dev The following trick is not specific to bkpprog. If you want to back up a whole Linux system in such a way that you can restore it more easily, you will probably want to read the contents of + directories that are hidden by a mount. This is the most important fo +r the directory C</dev>, because a filesystem is often mounted on it, ye +t the files under it on the root partition (especially C</dev/console> and C</dev/null>) might be necessary to boot your system. If you want to access the contents of such directories, here's what yo +u do. You create a bind mount of the root filesystem that only root can read +, and which is not a recursive bind mount, so it won't copy other mounts und +er it. For example, run the following commands as root. mkdir -m 700 /mnt/safe mkdir /mnt/safe/root mount --bind / /mnt/safe/root Then, add the command cd /mnt/safe/root to the bkpprog configuration file so it works from the bind mount inst +ead of the filesystem root. Don't forget, however, that bkpprog does not descend to mount points b +y default, so if you wish to back up multiple filesystems, you will need + to add explicit B<i> statements for them =head1 BUGS The author and maintainer of this program is Zsban Ambrus L<mailto:amb +rus@math.bme.hu>. You may try to write to him for any further bugs you have found. =over =item * The archive names seriously must not contain strange characters, it is a security bug if they do. (The files that are backed up may co +ntain any character though.) =item * May assign hardlinks to two different volumes. Size of linked files count multiple times. =item * May get confused by directories with identical dev-ino pair, such as caused by bind mounts. =item * This program is tailored for my needs, instead of being a general solu +tion. =item * The algorithm for breaking to chunks could be made yet more intelligen +t. =item * I have never tested restoring from backups made with this program. I +have never needed it, luckily. That's probably okay for me as a home user, + but if you are running a more important system, you should probably test restoring, and also probably shouldn't use this program. =back =head1 WARNING This program is just a tool. Running it alone does not ensure that yo +u have backups. It is entirely your responsability that you do not lose data +. =head1 COPYING Copyright (C) Zsban Ambrus 2010 This program is free software: you can redistribute it and/or modify it under terms of either the GNU General Public License version 3, as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A copy of the GNU General Public License can be found at "L<http://www.gnu.org/licenses/>". =cut use warnings; use strict; use Carp qw"cluck"; use Fcntl ":mode"; use IO::Handle; use Getopt::Long; #use Data::Dump::Streamer; sub percdecode { # should use the functions in coreutils/lib/xstrtol.{c,h} my($s) = @_; defined($s) or return undef; $s =~ s/\+/ /g; $s =~ s/\%([0-9a-fA-F][0-9a-fA-F])/chr(hex($1))/ge; $s; } sub percencode { my($s) = @_; $s =~ s/([^\$\&\,\-\.\/0-9\:\;\<\=\>\@A-Z\^\_\`a-z\~])/sprintf("%% +%02x", ord($1))/ge; $s; } our %fromhuman_prefix = ("", 0, qw"k 1 m 2 g 3 t 4 p 5 e 6"); sub fromhuman { my($s) = @_; $s =~ /\A\s*(-?\d+)([kMGTPE]?)\s*\z/i or die "error: invalid numbe +r ($s)"; defined(my $r = $fromhuman_prefix{lc $2}) or die "internal error: +fromhuman_prefix not found"; my $n = $1; $n * 1024**$r; } our( $execute, @includep, @excludep, $startcd, $volsiz, $voltres_ba, $voltres_bg, $voltres_br, $enabled, $found_endplan, @volumee, %volume, @assignp, $packcmd, %include, %exclude, %met, %genname_count, %metf, $DEBUG_TRAVERSE, $DEBUG_PIPEPERCENT, ); sub init { $volsiz = 6e8; $startcd = "."; $DEBUG_TRAVERSE = 0; $DEBUG_TRAVERSE = 0; Getopt::Long::Configure "bundling", "gnu_compat", "prefix_pattern= +(--|-)"; GetOptions( "e|execute!", \$execute, ) or exit(2); } sub openconf { 1 == @ARGV or die "Usage: perl bkpprog.pl bkprule > bkpplan"; open RULEF, "<", $ARGV[0] or die; if (!$execute) { *PLANF = *STDOUT; autoflush PLANF, 1; } } sub readconf { while (<RULEF>) { if (!$execute) { print PLANF $_; # required for e and cmd statements } /^#/ and next; /\S/ or next; my($cmd, $rest) = split " ", $_, 2; my($arg0, $arg1, $arg2) = split " ", $rest; $_ = percdecode($_) for $arg0, $arg1, $arg2; if ("cmd" eq $cmd) { $packcmd = $rest; } elsif ("vz" eq $cmd) { $volsiz = fromhuman($arg0) or die "invalid volume size"; } elsif ("vtba" eq $cmd) { $voltres_ba = fromhuman($arg0); } elsif ("vtbg" eq $cmd) { $voltres_bg = fromhuman($arg0); } elsif ("vtbr" eq $cmd) { $voltres_br = fromhuman($arg0); } elsif ("cd" eq $cmd) { $startcd = $arg0; } elsif ("i" eq $cmd) { push @includep, [$arg0, $arg1]; } elsif ("e" eq $cmd) { push @excludep, $arg0; } elsif ("ena" eq $cmd) { $enabled = 1; } elsif ("dis" eq $cmd) { $enabled = 0; } elsif ("v" eq $cmd) { if ($enabled) { push @volumee, $arg0; } $volume{$arg0} = [ $arg1, $arg2, [] ]; } elsif ("p" eq $cmd) { $volume{$arg0} or die "error: p command without correspond +ing v command"; push @{${$volume{$arg0}}[2]}, $arg1; push @assignp, $arg1; # even if volume disabled } elsif ("endplan" eq $cmd) { $found_endplan++; } else { warn "unrecognized command ignored: $_"; } } } sub prepconf { if ($execute && !$found_endplan) { die "error: input is not a backup plan file"; } elsif (!$execute && ($found_endplan || %volume)) { die "error: input is not a backup config file"; } chdir $startcd or die "error: cannot chdir to ($startcd)"; for (@excludep) { my $e = !(my($dev, $ino) = lstat $_); if ($e) { warn "cannot stat excluded path ($_): $!"; next; } $exclude{$dev . ":" . $ino} = $_; } my @includef = map { $$_[0] } @includep; if ($execute) { push @includef, @assignp; } for (@includef) { my $e = !(my($dev, $ino) = lstat $_); if ($e) { warn "cannot stat included/assigned path ($_): $!"; next; } $include{$dev . ":" . $ino} = 1; } $voltres_ba = 0.40*$volsiz; $voltres_bg = 0.80*$volsiz; $voltres_br = 0.40*$volsiz; } sub traverse { my($p, $curprefix, $curdev, $top, $LISTPIPE, $volsz, $refsize, $vn +am) = @_; my $e = !(my($dev, $ino, $mode, $_nlink, $_uid, $_gid, $_rdev, $si +ze) = lstat $p); if ($e) { warn "warning: cannot stat file ($p), skipping: $!\n"; return; } if (defined($curdev)) { if ($curdev != $dev) { warn "skipping xdev ($p)\n"; return; } } else { $curdev = $dev; } my $devino = $dev . ":" . $ino; if (exists($include{$devino}) && !$top) { return; } my $met = $met{$devino}++; if (S_ISDIR($mode) && $met) { warn "skipping already met dir ($p)\n"; return; } if (exists($exclude{$devino})) { return; } my $tsz = 128 + length($p); my(@s); if (S_ISREG($mode) || S_ISLNK($mode)) { $tsz += $size; } elsif (S_ISDIR($mode)) { if (!opendir my $D, $p) { warn "cannot opendir file ($p), skipping contents: $!\n"; } else { while (my $n = readdir $D) { "." eq $n || ".." eq $n and next; my $i = traverse($p . "/" . $n, $curprefix, $curdev, 0 +, $LISTPIPE, $volsz, $refsize, $vnam); if (!$execute && defined($i)) { my($n, $ssz) = @$i; push @s, $i; $tsz += $ssz; } } closedir $D or warn "error: cannot closedir file ($p)"; } } if ($execute) { $DEBUG_PIPEPERCENT and printf STDERR "%-10s %3.0f%% %.60s \e[K +\r", $vnam, 100*$$refsize/$volsz, percencode($p); print $LISTPIPE $p, "\0"; $$refsize += $tsz; } else { # plan my $frag; if ($volsiz < $tsz) { $DEBUG_TRAVERSE and printf STDERR "#[B %12d %s]\n", $tsz, + percencode($p); my @ssz = map { $$_[1] } @s; @s = @s[sort { $ssz[$b] <=> $ssz[$a] } 0 .. @s - 1]; my $rsz = 0; my @r; for my $i (@s) { my($sp, $ssz) = @$i; if ( $voltres_bg <= $rsz + $ssz && @r || 1 == @r && $voltres_ba <= $rsz ) { $DEBUG_TRAVERSE and printf STDERR "#[BG %12d %d]\n +", $rsz, 0+@r; emit($curprefix, \@r); $tsz -= $rsz; $rsz = 0; @r = (); $frag = "part" } push @r, $i; $rsz += $ssz; } if ($voltres_br <= $tsz) { $DEBUG_TRAVERSE and printf STDERR "#[BR %12d %d %d]\n" +, $tsz, 0+@r, 0<@r; emit($curprefix, [[$p, $tsz, 0<@r ? "part" : "file"]]) +; return; } } 2 <= $DEBUG_TRAVERSE and printf STDERR "#[U %12d %s]\n", $tsz +, percencode($p); return[$p, $tsz, $frag]; } } sub genname { my($prefix) = @_; $prefix =~ /\d\z/ and $prefix .= "_"; $prefix . ($genname_count{$prefix}++); } sub emit { my($curprefix, $desc) = @_; my $vnam = genname($curprefix); my $tsz = 0; $tsz += $$_[1] for @$desc; printf PLANF "v %s %s %12d\n", $vnam, $vnam, $tsz; for (@$desc) { my($p, $_sz, $frag) = @$_; printf PLANF "p %s %s%s\n", $vnam, percencode($p), ($frag +? " #$frag" : ""); } } sub planall { print PLANF "\nena\n"; for (@includep) { my($p, $curprefix) = @$_; my $i = traverse($p, $curprefix, undef, 1); if (defined($i)) { emit($curprefix, [$i]); } } print PLANF "endplan\n"; } sub execall { for my $vid (@volumee) { my($vnam, $vsize, $ps) = @{$volume{$vid}}; my $cmd = $packcmd; $cmd =~ s/\$v/$vnam/g or warn "warning: cannot find '\$v' esca +pe in pack command"; $cmd =~ s/\$b/$vsize/g; $cmd =~ s/\$d/\$/g; open my $LISTPIPE, "|-", $cmd or die "error: cannot popen pack command ($cmd)"; binmode $LISTPIPE or die "error: cannot binmode pipe to pack c +ommand"; # for good measure autoflush $LISTPIPE, 1; my $used = 0; printf STDERR "starting volume %s (%d B)\n", $vnam, $vsize; for my $p (@$ps) { my $i = traverse($p, $vnam, undef, 1, $LISTPIPE, $vsize, \ +$used, $vnam); } $DEBUG_PIPEPERCENT and printf STDERR "%-10s %3.0f%% \e[K\r", +$vnam, 100*$used/$vsize; close $LISTPIPE or die "warning: command died ($cmd): $?"; printf STDERR "finished volume %s (%d B)\e[K\n", $vnam, $vsize +; } warn "all done\n"; } sub finish { for (keys %exclude) { if (!$met{$_}) { warn "warning: never met excluded directory ($exclude{$_}) +"; } } } sub main { init(); openconf(); readconf(); prepconf(); if ($execute) { execall(); } else { planall(); } finish(); } main(); __END__

Replies are listed 'Best First'.
Re: bkpprog - prepare full backups in reasonably sized tarballs
by ambrus (Abbot) on Aug 13, 2013 at 12:02 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://922051]
Approved by planetscape
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-03-19 04:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found