use strict; package File::DiffTree; # File::DiffTree -- Compare two directory hierarchies # By Ned Konz, perl@bike-nomad.com # $Revision$ use vars '$VERSION'; use Algorithm::Diff 1.01 (); use File::Find (); BEGIN { $File::DiffTree::VERSION = "0.1"; } sub _findFiles { my $topDir = shift; my $statFields = shift; my $reject = shift; my $topDirLength = length($topDir); my @files; File::Find::find( sub { my @stat = stat($_); if (!@stat) { warn "can't stat $File::Find::name : $!\n"; return } return if -d _; my $fileName = substr($File::Find::name, $topDirLength); return if defined($reject) && &$reject($File::Find::name, @stat); push(@files, [ $fileName, @stat[ @$statFields ] ]); }, $topDir); return \@files; } sub diffTree { my $dirA = shift; my $dirB = shift; my $userOptions = shift || { }; my $numberOfFields; my $numberOfSignificantFields; my $foldCase = sub { $_[0] }; # default no fold my %options = ( onlya => sub { }, onlyb => sub { }, match => sub { }, statfields => [ 7, 9 ], # size, mtime hash => sub { # default=stringize them my $arr = shift; join($;, $foldCase->($arr->[0]), @{$arr}[ 1 .. $numberOfSignificantFields ]) }, reject => undef, significantfields => undef, foldcase => 0, sort => sub { $foldCase->($a->[0]) cmp $foldCase->($b->[0]) }, # normalize the user's keys (DWIM) map { my $key = $_; $key =~ tr/A-Z_/a-z/d; $key, $userOptions->{$_} } keys(%$userOptions) ); $numberOfFields = scalar(@{$options{statfields}}); $numberOfSignificantFields = defined($options{significantfields}) ? $options{significantfields} : $numberOfFields; $numberOfSignificantFields = $numberOfFields if $numberOfSignificantFields > $numberOfFields; $foldCase = sub { lc($_[0]) } if $options{foldcase}; my $filesA = _findFiles($dirA, $options{statfields}, $options{reject}); my $filesB = _findFiles($dirB, $options{statfields}, $options{reject}); # sort by name @$filesA = sort { $options{sort}->() } @$filesA; @$filesB = sort { $options{sort}->() } @$filesB; Algorithm::Diff::traverse_sequences( $filesA, $filesB, { MATCH => sub { $options{match}->($filesA->[$_[0]], $filesB->[$_[1]]) }, DISCARD_A => sub { $options{onlya}->($filesA->[$_[0]]) }, DISCARD_B => sub { $options{onlyb}->($filesB->[$_[1]]) }, }, $options{hash} ); } 1; __END__ =head1 NAME File::DiffTree - Compare two directory hierarchies =head1 SYNOPSIS use File::DiffTree; File::DiffTree::diffTree($dirA, $dirB, { Match => sub { print $_[0]->[0], " matches\n" }, # Fold_Case => 1, # if on OS that doesn't care like windoze }); =head1 DESCRIPTION C compares the files in two directory hierarchies, calling optional user-supplied callbacks for files in just one or the other directory, as well as for files that match. Matching is determined by matching the name (with optional case folding), as well as zero or more of the numbers output by the C call. You can specify how many fields from C will be looked at for a match. You can also specify how many fields from stat will be provided to your callback routines. See OPTIONS below for the options to C. =head1 OPTIONS The third argument to File::DiffTree::diffTree is a hash reference that can contain the following options. Option names may have underscores or capital letters as desired (that is, OnlyA, O_n_L_ya, Only_A, onlya, and only_a are equivalent). Since nothing by default is done for B, B, or B, you must provide at least one of these for any interesting behavior. =over 4 =item B =item B The B and B options supply CODE references to user callback routines that are called when a file appears in only one of the two directory trees, or exists in both but has different significant stat fields. By default, nothing is done for these files. The argument to these routines is an array that contains the filename relative to the starting directory, as well as whatever fields from stat were defaulted or specified with the B option. File::DiffTree::diffTree( $dir1, $dir2, { only_a => sub { print "only in $dir1: ", $_[0]->[0], "\n" }, only_b => sub { print "only in $dir2: ", $_[0]->[0], "\n" }, }); Of course, you can also specify a reference to a separate subroutine that you've written: File::DiffTree::diffTree( $dir1, $dir2, { only_a => \&onlyA, only_b => \&onlyB, }); =item B The B option supplies a CODE reference to a user callback routine that is called when a file appears to match (based on name and significant fields from the stat call). By default, nothing is done for these files. The arguments to the B routine are two arrays (one for each directory) that contain the filename relative to the starting directory, as well as whatever fields from stat were defaulted or specified with the B option. File::DiffTree::diffTree( $dir1, $dir2, { match => sub { print "in both $dir1 and $dir2: ", $_[0]->[0], "\n" }, }); =item B The B option specifies which fields from C will be passed to the B, B, or B user callbacks. This is an ARRAY reference consisting of numbers from 0 through 12. By default, it is: stat_fields => [ 7, 9 ], That is, the size and mtime (last modified time) of the files are passed. The possible field numbers are: 0 dev device number of filesystem 1 ino inode number 2 mode file mode (type and permissions) 3 nlink number of (hard) links to the file 4 uid numeric user ID of file's owner 5 gid numeric group ID of file's owner 6 rdev the device identifier (special files only) 7 size total size of file, in bytes 8 atime last access time in seconds since the epoch 9 mtime last modify time in seconds since the epoch 10 ctime inode change time (NOT creation time!) in seconds since the epoch 11 blksize preferred block size for file system I/O 12 blocks actual number of blocks allocated If you want to compare only the name and size, but still have access to the modification time and inode, you can specify this using: File::DiffTree::diffTree( $dir1, $dir2, { match => sub { print "in both $dir1 and $dir2: ", $_[0]->[0], "\n" }, stat_fields => [ 7, 9, 1 ], # size, mtime, inode significant_fields => 1, # just size }); Unless the B option below is specified, all of the B will be considered when looking for a match. So by default, file comparisons compare name, size, and modification time. =item B The B option is a number that specifies how many of the fields from C will be considered when comparing files. By default, all of the fields will be compared. If you supply a 0 for B, only the name will be compared. This option is provided so that you can have separate control over how many fields from C you are passed and how many of those fields are compared by C. =item B The B option is a CODE reference that can be provided to filter files that are unwanted. It is called from inside C with the full filename and all the fields from C (this is 13 arguments). Also, the C<$_> variable is set to the last component of the filename, the current directory is the directory of the file, and the C<_> pseudo- file handle can be tested. If it returns true, the file will not be considered. For instance, to ignore files that are unreadable or end in C<.bak>, you can do this: File::DiffTree::diffTree( $dir1, $dir2, { match => sub { print "in both $dir1 and $dir2: ", $_[0]->[0], "\n" }, reject => sub { /\.bak$/ || ! -r _ } }); =item B If the B option is provided and is true, filenames will be compared ignoring case differences. The filenames passed to the user callbacks will have the actual case preserved. This is probably what is wanted under Windows. You can do this for portability: Fold_Case => ($^O eq 'Win32'), =item B The B option is a CODE reference that supplies an optional subroutine that will be called when sorting the lists of files. It will have the two arrays to be compared passed in via the package variables C<$File::DiffTree::a> and C<$File::DiffTree::b>. By default, sorting is by filename, with case folding if the B option is set. You probably won't need this option. If you do, you may have to supply the B option as well. =item B The B option is a CODE reference that supplies an optional subroutine that will be called to generate a key to determine uniqueness of the files. By default, this key will consist of the file name, and all the stat fields specified by the B option, turned into strings and separated by the C<$;> character (by default C<\034>). Specify the B option if you need to do something different. The argument to this subroutine is an array reference like those passed to the B and B subroutines. You shouldn't need this option. If you do, you'll probably have to supply the B option as well. =back =head2 EXPORT File::DiffTree doesn't export anything. Typing is good for you. Call diffTree as File::DiffTree::diffTree . =head1 AUTHOR By Ned Konz, perl@bike-nomad.com. =head1 LICENSE This module is licensed under the same license as Perl itself. =head1 SEE ALSO perl(1). L =cut # vim: ts=4 sw=4