This program helps you make full backups broken to smaller sized tarballs.
It is recommended for home computers only, not for production systems.
In particular, it does not give you a way to retrieve the backups.
In the first pass, the program will recursively scan through the part
of the filesystem you mark to be backed up, and tries to segment it to
chunks that will get archived of a preset size. For this, the program
needs a configuration file usually called bkprule which tells which
parts of the filesystem to include and exclude. It writes the chunks
found to a plan file usually called bkpplan which tells what parts
of the filesystem goes to which chunk. The actual tarballs are then
created in the second pass. You have a chance to review the chunks and
change the plan between the two passes.
You normally use this program to create the backup tarballs on a hard disk.
You will then usually burn these tarballs to optical disks, but this program
does not aim in that. You will have to group the tarballs to larger groups,
each of which fit on a disk, and then use some external program to burn them
to disks. This program also does not aid you in restoring your system from
the backups.
The sizes of chunks are only approximately equal, so don't use this
program if you want to write one file per disk. However, the grouping
tries to keep directories in one chunk when practical, so each chunk
will usually have a set of related files. This can aid in selectively
retrieving files from the backup, but more importantly it lets you see
in advance which directories will take the most space in the backup.
Though the default is to create all tarballs at once, there is a way to
create only some of the tarballs at once, e.g. if you are low on disk space.
In this case, you will run the first pass only once, but run the second pass
multiple times, each time editing the plan file first (see later) to select
which tarballs to actually produce.
In the first pass, you call the program with the name of the configuration file
as its sole command-line argument. The plan file is written to standard
output. In the second pass, you call the program with the switch -e and
the name of the plan file.
This program is just a tool. Running it alone does not ensure that you have
backups. It is entirely your responsability that you do not lose data.
#!perl
=head1 NAME
bkpprog - prepare full backups in reasonably sized tarballs
=head1 SYNOPSIS
cd /home/bkp/bkpcur
cat > bkprule <<ENDRULES
cmd tar -cz --null --no-recursion -T - -f /home/bkp/bkpcur/$v.tg
+z
cd /
vz 256m
i . roo
i ./usr/local loc
e ./home/bkp
ENDRULES
perl bkpprog.pl bkprule > bkpplan # scans files
perl bkpprog.pl -e bkpplan # creates tarballs
# then write the tarballs from /home/bkp/bkpcur to the backup medi
+a
=head1 DESCRIPTION
This program helps you make full backups broken to smaller sized tarba
+lls.
It is recommended for home computers only, not for production systems.
In particular, it does not give you a way to retrieve the backups.
In the first pass, the program will recursively scan through the part
of the filesystem you mark to be backed up, and tries to segment it to
chunks that will get archived of a preset size. For this, the program
needs a configuration file usually called B<bkprule> which tells which
parts of the filesystem to include and exclude. It writes the chunks
found to a plan file usually called B<bkpplan> which tells what parts
of the filesystem goes to which chunk. The actual tarballs are then
created in the second pass. You have a chance to review the chunks an
+d
change the plan between the two passes.
You normally use this program to create the backup tarballs on a hard
+disk.
You will then usually burn these tarballs to optical disks, but this p
+rogram
does not aim in that. You will have to group the tarballs to larger g
+roups,
each of which fit on a disk, and then use some external program to bur
+n them
to disks. This program also does not aid you in restoring your system
+ from
the backups.
The sizes of chunks are only approximately equal, so don't use this
program if you want to write one file per disk. However, the grouping
tries to keep directories in one chunk when practical, so each chunk
will usually have a set of related files. This can aid in selectively
retrieving files from the backup, but more importantly it lets you see
in advance which directories will take the most space in the backup.
Though the default is to create all tarballs at once, there is a way t
+o
create only some of the tarballs at once, e.g. if you are low on disk
+space.
In this case, you will run the first pass only once, but run the secon
+d pass
multiple times, each time editing the plan file first (see later) to s
+elect
which tarballs to actually produce.
In the first pass, you call the program with the name of the configura
+tion file
as its sole command-line argument. The plan file is written to standa
+rd
output. In the second pass, you call the program with the switch B<-e
+> and
the name of the plan file.
=head1 CONFIGURATION FILE
The following commands can be used in the configuration file.
Each command must be in a separate line.
=over
=item B<i> I<pathname> I<prefix>
Gives the pathname of a file or directory to include in the backup,
with all its contents recursively. You should have at least one of th
+is
command in the configuration file to do anything useful.
The prefix should be an alphabetic string that will be used in the nam
+e
of the tarball (a number is appended to distinguish chunks). It is ok
+ay
to use the same prefix in multiple B<i> commands. The recursion does
not descend to subdirectories that are mount points, nor to directorie
+s
that are specifically forbidden with the B<e> command, but descends to
anywhere else. If we descend to a directory mentioned in another B<i>
command, it is included only once, and chunks made from that subdirect
+ory
will use the prefix from the latter command.
=item B<e> I<pathname>
Excludes a pathname and anything under it recursively, except for subp
+aths
specifically included with another B<i> command.
You should probably exclude the directory where the backup tarballs ar
+e
created, if they would normally be included.
=item B<cmd> I<command>
Gives the shell command to use in the second phase to create a tarball
+.
In this line, C<$v> is replaced by the name of the chunk (a prefix and
a serial number), C<$b> is replaced by the expected uncompressed size
of this chunk in bytes, and C<$d> is replaced by a literal dollar sign
+.
The pathname of each individual file is written to the standard input
+of
this command, separated with nul characters. It is the responsability
of this command to decide on the pathname of the tarball (probably usi
+ng
the chunkname plus some extension and directory), and to actually writ
+e
to it. The command must not recurse to subdirectories.
=item B<cd> I<pathname>
Gives a directory to change to after reading the configuration file bu
+t before
doing anything else. All pathnames to be backed up are then relative
+to
this directory, and this is also the working directory when the extern
+al
command is invoked.
=item B<vz> I<size>
Gives the target size of volumes. The size is in bytes but suffixes l
+ike
B<M> can be used.
=item B<#> I<anything>
A comment.
=item B<vtba>, B<vtbg>, B<vtbr>
These commands are undocumented, and are used to fine-tune the algorit
+hm to
break to chunks.
=back
=head1 THE PLAN FILE
The plan file is output by the first pass, is used by the second pass.
It can be read and edited by hand. It starts with a copy of the conte
+nts
the configuration file, both for documentation, and because some comma
+nds
are reused. The commands are as follows.
=over
=item B<#> I<anything>
=item B<cmd> I<command>
=item B<cd> I<pathname>
See in the L</CONFIGURATION FILE> section above.
=item B<e> I<pathname>
Exclude this pathname and files under it, except for subpaths specific
+ally
included by a B<p> command.
=item B<v> I<chunkname-int> I<chunkname-ext> I<size>
Create a chunk. The first two parameters give the name of the chunk a
+nd
are usually identical. If not, the first parameter tells the name use
+d
in B<p> statements, but the second is passed to the external command.
The last argument is the expected uncompressed size of this chunk in
bytes, which can be passed to the command for information reasons,
but is otherwise for documentation only.
=item B<p> I<chunkname> I<pathname>
Include the pathname and everything under it to a given chunk. Files
mentioned in B<e> commands or other B<p> commands are excluded though.
=item B<ena>
=item B<dis>
Process only chunks between an B<ena> statement and a B<dis> statement
+.
The command to create a tarball is not ran for other chunks. This can
be used to create the backup in multiple passes if you're low on disk
+space.
The B<v> statements for disabled chunks are still read to know that fi
+les
marked by them should not be included in other (enabled) chunks.
=item B<endplan>
Marks the end of this file.
=back
=head1 NOTES
The plan file only contains enough directory and filenames that
separate the chunks from each other, but the actual full list of files
is rediscovered in the second pass, thus any files created between the
first and the second pass will be get to the backup.
Computing the sizes is approximate for the following reasons.
=over
=item *
You will probably create compressed tarballs, but the program counts t
+he
uncompressed size.
=item *
The size of meta-data in the tarball is not counted precisely.
=item *
If a file has multiple names, the size
is counted once for each (they might not actually be stored multiple
times in the tarball, because if two names of the same file would get
into the same tarball, tar will only store them once).
=item *
The algorithm actively favors making boundaries of chunks simpler, esp
+ecially
making chunks that consist of all files under a single directory but n
+othing
else, even if this means making smaller chunks.
=item *
Individual files are
never broken to multiple chunks, so backing up a very large file will
+imply
having a very large tarball (though as you will have few of these file
+s, you
can just exclude them and back them up separately without this program
+).
=item *
Files may get created or changed between the two passes.
=back
The program tries to exclude any mount points so only once file system
+ is
backed up. To override this, use an B<i> command for each mount point
+.
=head1 EXAMPLES & HINTS
See L<SYNOPSIS> for a full example of a (short) configuration file.
=head2 What cmd to use
Here are a few examples for the B<cmd> command.
A minimal example that uses the C<tar> program is the following.
cmd tar -cz --null --no-recursion -T - -f /home/bkp/bkpcur/$v.tgz
Here, you should replace the pathname with whatever directory you want
+ the
tarballs to be placed. A more complicated example follows, which also
+ prints an
approximate progress percentage while creating each tarball
cmd tar -cz --null --no-recursion -T - -f /home/bkp/bkp1107/all/$v
+.tgz -b 20 --checkpoint=100 --checkpoint-action=exec='printf " %s %3
+d%%\e[K\r" $v $[(TAR_CHECKPOINT*TAR_BLOCKING_FACTOR*512*100+$b/2)/$b]
+ >&2' ; echo >&2
=head2 Su
If you want to back up a whole system, you will typically run bkpprog
+as root
(in both phases), so it can read all directories and stat all files yo
+u will
back up. You can then run the cd-writing program as a user, provided
+you
make sure it can read the tarballs you have created.
In principle, however, there is no need to run the program as super-us
+er.
If all the files and directories you want to back up are readable to a
+ user,
you can run this program as that user.
=head2 Bind mount / to read /dev
The following trick is not specific to bkpprog.
If you want to back up a whole Linux system in such a way that you can
restore it more easily, you will probably want to read the contents of
+
directories that are hidden by a mount. This is the most important fo
+r
the directory C</dev>, because a filesystem is often mounted on it, ye
+t the
files under it on the root partition (especially C</dev/console> and
C</dev/null>) might be necessary to boot your system.
If you want to access the contents of such directories, here's what yo
+u do.
You create a bind mount of the root filesystem that only root can read
+, and
which is not a recursive bind mount, so it won't copy other mounts und
+er it.
For example, run the following commands as root.
mkdir -m 700 /mnt/safe
mkdir /mnt/safe/root
mount --bind / /mnt/safe/root
Then, add the command
cd /mnt/safe/root
to the bkpprog configuration file so it works from the bind mount inst
+ead of
the filesystem root.
Don't forget, however, that bkpprog does not descend to mount points b
+y
default, so if you wish to back up multiple filesystems, you will need
+ to
add explicit B<i> statements for them
=head1 BUGS
The author and maintainer of this program is Zsban Ambrus L<mailto:amb
+rus@math.bme.hu>.
You may try to write to him for any further bugs you have found.
=over
=item *
The archive names seriously must not contain strange characters,
it is a security bug if they do. (The files that are backed up may co
+ntain
any character though.)
=item *
May assign hardlinks to two different volumes.
Size of linked files count multiple times.
=item *
May get confused by directories with identical dev-ino pair,
such as caused by bind mounts.
=item *
This program is tailored for my needs, instead of being a general solu
+tion.
=item *
The algorithm for breaking to chunks could be made yet more intelligen
+t.
=item *
I have never tested restoring from backups made with this program. I
+have
never needed it, luckily. That's probably okay for me as a home user,
+ but
if you are running a more important system, you should probably test
restoring, and also probably shouldn't use this program.
=back
=head1 WARNING
This program is just a tool. Running it alone does not ensure that yo
+u have
backups. It is entirely your responsability that you do not lose data
+.
=head1 COPYING
Copyright (C) Zsban Ambrus 2010
This program is free software: you can redistribute it and/or modify
it under terms of either the GNU General Public License version 3,
as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
A copy of the GNU General Public License can be found at
"L<http://www.gnu.org/licenses/>".
=cut
use warnings; use strict;
use Carp qw"cluck";
use Fcntl ":mode";
use IO::Handle;
use Getopt::Long;
#use Data::Dump::Streamer;
sub percdecode {
# should use the functions in coreutils/lib/xstrtol.{c,h}
my($s) = @_;
defined($s) or return undef;
$s =~ s/\+/ /g;
$s =~ s/\%([0-9a-fA-F][0-9a-fA-F])/chr(hex($1))/ge;
$s;
}
sub percencode {
my($s) = @_;
$s =~ s/([^\$\&\,\-\.\/0-9\:\;\<\=\>\@A-Z\^\_\`a-z\~])/sprintf("%%
+%02x", ord($1))/ge;
$s;
}
our %fromhuman_prefix = ("", 0, qw"k 1 m 2 g 3 t 4 p 5 e 6");
sub fromhuman {
my($s) = @_;
$s =~ /\A\s*(-?\d+)([kMGTPE]?)\s*\z/i or die "error: invalid numbe
+r ($s)";
defined(my $r = $fromhuman_prefix{lc $2}) or die "internal error:
+fromhuman_prefix not found";
my $n = $1;
$n * 1024**$r;
}
our(
$execute,
@includep, @excludep, $startcd,
$volsiz, $voltres_ba, $voltres_bg, $voltres_br,
$enabled, $found_endplan, @volumee, %volume, @assignp,
$packcmd,
%include, %exclude, %met, %genname_count, %metf,
$DEBUG_TRAVERSE, $DEBUG_PIPEPERCENT,
);
sub init {
$volsiz = 6e8;
$startcd = ".";
$DEBUG_TRAVERSE = 0;
$DEBUG_TRAVERSE = 0;
Getopt::Long::Configure "bundling", "gnu_compat", "prefix_pattern=
+(--|-)";
GetOptions(
"e|execute!", \$execute,
) or exit(2);
}
sub openconf {
1 == @ARGV or die "Usage: perl bkpprog.pl bkprule > bkpplan";
open RULEF, "<", $ARGV[0] or die;
if (!$execute) {
*PLANF = *STDOUT;
autoflush PLANF, 1;
}
}
sub readconf {
while (<RULEF>) {
if (!$execute) {
print PLANF $_; # required for e and cmd statements
}
/^#/ and next; /\S/ or next;
my($cmd, $rest) = split " ", $_, 2;
my($arg0, $arg1, $arg2) = split " ", $rest;
$_ = percdecode($_) for $arg0, $arg1, $arg2;
if ("cmd" eq $cmd) {
$packcmd = $rest;
} elsif ("vz" eq $cmd) {
$volsiz = fromhuman($arg0) or die "invalid volume size";
} elsif ("vtba" eq $cmd) {
$voltres_ba = fromhuman($arg0);
} elsif ("vtbg" eq $cmd) {
$voltres_bg = fromhuman($arg0);
} elsif ("vtbr" eq $cmd) {
$voltres_br = fromhuman($arg0);
} elsif ("cd" eq $cmd) {
$startcd = $arg0;
} elsif ("i" eq $cmd) {
push @includep, [$arg0, $arg1];
} elsif ("e" eq $cmd) {
push @excludep, $arg0;
} elsif ("ena" eq $cmd) {
$enabled = 1;
} elsif ("dis" eq $cmd) {
$enabled = 0;
} elsif ("v" eq $cmd) {
if ($enabled) {
push @volumee, $arg0;
}
$volume{$arg0} = [
$arg1, $arg2, []
];
} elsif ("p" eq $cmd) {
$volume{$arg0} or die "error: p command without correspond
+ing v command";
push @{${$volume{$arg0}}[2]}, $arg1;
push @assignp, $arg1; # even if volume disabled
} elsif ("endplan" eq $cmd) {
$found_endplan++;
} else {
warn "unrecognized command ignored: $_";
}
}
}
sub prepconf {
if ($execute && !$found_endplan) {
die "error: input is not a backup plan file";
} elsif (!$execute && ($found_endplan || %volume)) {
die "error: input is not a backup config file";
}
chdir $startcd or die "error: cannot chdir to ($startcd)";
for (@excludep) {
my $e = !(my($dev, $ino) = lstat $_);
if ($e) {
warn "cannot stat excluded path ($_): $!";
next;
}
$exclude{$dev . ":" . $ino} = $_;
}
my @includef = map { $$_[0] } @includep;
if ($execute) {
push @includef, @assignp;
}
for (@includef) {
my $e = !(my($dev, $ino) = lstat $_);
if ($e) {
warn "cannot stat included/assigned path ($_): $!";
next;
}
$include{$dev . ":" . $ino} = 1;
}
$voltres_ba = 0.40*$volsiz;
$voltres_bg = 0.80*$volsiz;
$voltres_br = 0.40*$volsiz;
}
sub traverse {
my($p, $curprefix, $curdev, $top, $LISTPIPE, $volsz, $refsize, $vn
+am) = @_;
my $e = !(my($dev, $ino, $mode, $_nlink, $_uid, $_gid, $_rdev, $si
+ze) =
lstat $p);
if ($e) {
warn "warning: cannot stat file ($p), skipping: $!\n";
return;
}
if (defined($curdev)) {
if ($curdev != $dev) {
warn "skipping xdev ($p)\n";
return;
}
} else {
$curdev = $dev;
}
my $devino = $dev . ":" . $ino;
if (exists($include{$devino}) && !$top) {
return;
}
my $met = $met{$devino}++;
if (S_ISDIR($mode) && $met) {
warn "skipping already met dir ($p)\n";
return;
}
if (exists($exclude{$devino})) {
return;
}
my $tsz = 128 + length($p);
my(@s);
if (S_ISREG($mode) || S_ISLNK($mode)) {
$tsz += $size;
} elsif (S_ISDIR($mode)) {
if (!opendir my $D, $p) {
warn "cannot opendir file ($p), skipping contents: $!\n";
} else {
while (my $n = readdir $D) {
"." eq $n || ".." eq $n and next;
my $i = traverse($p . "/" . $n, $curprefix, $curdev, 0
+, $LISTPIPE, $volsz, $refsize, $vnam);
if (!$execute && defined($i)) {
my($n, $ssz) = @$i;
push @s, $i;
$tsz += $ssz;
}
}
closedir $D or warn "error: cannot closedir file ($p)";
}
}
if ($execute) {
$DEBUG_PIPEPERCENT and printf STDERR "%-10s %3.0f%% %.60s \e[K
+\r", $vnam, 100*$$refsize/$volsz, percencode($p);
print $LISTPIPE $p, "\0";
$$refsize += $tsz;
} else { # plan
my $frag;
if ($volsiz < $tsz) {
$DEBUG_TRAVERSE and printf STDERR "#[B %12d %s]\n", $tsz,
+ percencode($p);
my @ssz = map { $$_[1] } @s;
@s = @s[sort { $ssz[$b] <=> $ssz[$a] } 0 .. @s - 1];
my $rsz = 0;
my @r;
for my $i (@s) {
my($sp, $ssz) = @$i;
if (
$voltres_bg <= $rsz + $ssz && @r ||
1 == @r && $voltres_ba <= $rsz
) {
$DEBUG_TRAVERSE and printf STDERR "#[BG %12d %d]\n
+", $rsz, 0+@r;
emit($curprefix, \@r);
$tsz -= $rsz;
$rsz = 0;
@r = ();
$frag = "part"
}
push @r, $i;
$rsz += $ssz;
}
if ($voltres_br <= $tsz) {
$DEBUG_TRAVERSE and printf STDERR "#[BR %12d %d %d]\n"
+, $tsz, 0+@r, 0<@r;
emit($curprefix, [[$p, $tsz, 0<@r ? "part" : "file"]])
+;
return;
}
}
2 <= $DEBUG_TRAVERSE and printf STDERR "#[U %12d %s]\n", $tsz
+, percencode($p);
return[$p, $tsz, $frag];
}
}
sub genname {
my($prefix) = @_;
$prefix =~ /\d\z/ and $prefix .= "_";
$prefix . ($genname_count{$prefix}++);
}
sub emit {
my($curprefix, $desc) = @_;
my $vnam = genname($curprefix);
my $tsz = 0;
$tsz += $$_[1] for @$desc;
printf PLANF "v %s %s %12d\n", $vnam, $vnam, $tsz;
for (@$desc) {
my($p, $_sz, $frag) = @$_;
printf PLANF "p %s %s%s\n", $vnam, percencode($p), ($frag
+? " #$frag" : "");
}
}
sub planall {
print PLANF "\nena\n";
for (@includep) {
my($p, $curprefix) = @$_;
my $i = traverse($p, $curprefix, undef, 1);
if (defined($i)) {
emit($curprefix, [$i]);
}
}
print PLANF "endplan\n";
}
sub execall {
for my $vid (@volumee) {
my($vnam, $vsize, $ps) = @{$volume{$vid}};
my $cmd = $packcmd;
$cmd =~ s/\$v/$vnam/g or warn "warning: cannot find '\$v' esca
+pe in pack command";
$cmd =~ s/\$b/$vsize/g;
$cmd =~ s/\$d/\$/g;
open my $LISTPIPE, "|-", $cmd or
die "error: cannot popen pack command ($cmd)";
binmode $LISTPIPE or die "error: cannot binmode pipe to pack c
+ommand"; # for good measure
autoflush $LISTPIPE, 1;
my $used = 0;
printf STDERR "starting volume %s (%d B)\n", $vnam, $vsize;
for my $p (@$ps) {
my $i = traverse($p, $vnam, undef, 1, $LISTPIPE, $vsize, \
+$used, $vnam);
}
$DEBUG_PIPEPERCENT and printf STDERR "%-10s %3.0f%% \e[K\r",
+$vnam, 100*$used/$vsize;
close $LISTPIPE or
die "warning: command died ($cmd): $?";
printf STDERR "finished volume %s (%d B)\e[K\n", $vnam, $vsize
+;
}
warn "all done\n";
}
sub finish {
for (keys %exclude) {
if (!$met{$_}) {
warn "warning: never met excluded directory ($exclude{$_})
+";
}
}
}
sub main {
init();
openconf();
readconf();
prepconf();
if ($execute) {
execall();
} else {
planall();
}
finish();
}
main();
__END__