Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

diff-servers.pl - Diff directory trees across machines

by Corion (Patriarch)
on Aug 20, 2015 at 08:30 UTC ( [id://1139276]=CUFP: print w/replies, xml ) Need Help??

This program creates a recursive diff between two directory trees. The directory trees can be local or reachable by ssh. The diff lists added and removed files and for changed files, it lists the differences in diff style. This makes it convenient to review the differences between two machines that should be identical, or to find the steps that are needed to bring one directory tree to another.

After a hint by salva that Net::SSH2 comes included with Strawberry Perl, I was motivated to rip out some system-specific ties to plink.exe and post this. Unfortunately, Net::SSH::Any doesn't seem to have a way to talk to the ssh agent for quick authentification, so this relies on Net::SSH2 instead of Net::SSH::Any.

#!perl -w use strict; use Algorithm::Diff; use Getopt::Long; use File::Find; use File::stat; use Net::SSH2; =head1 NAME diff-servers.pl =head1 ABSTRACT Generate a diff between two directory trees. The directory trees must +be local or reachable via ssh. Only the ssh2 protocol is supported. =head1 SYNOPSIS perl -w diff-servers.pl corion@production:/opt/mychat corion@staging +:/tmp/deploy-0.025 --ignore .bak --ignore .gpg --ignore .pgp --ignore + random_seed Quick diff, showing only added/missing files without comparing their c +ontent perl -w diff-servers.pl c:\mychat\old-versions\0.025 corion@staging: +/tmp/deploy-0.025 -q =head1 OPTIONS =over 4 =item B<--ignore> Regexp of directory entries to ignore =item B<--no-mode> Don't compare the file mode. =item B<--no-owner> Don't compare the file owner. =item B<--quick> Don't compare file contents =item B<--verbose> Be somewhat more verbose =back =head1 PREREQUISITES Currently, the script expects C<find> and C<perl> to be available on t +he remote side. The dependency on C<find> could be eliminated by implementing th +e functionality in Perl. The dependency on C<perl> on the remote side could be elimina +ted by using the SFTP protocol for retrieving the directory tree, at an added compl +exity. =cut GetOptions( 'verbose|v' => \my $verbose, 'ignore|i:s' => \my @ignore, 'no-owner|o' => \my $ignore_owner, 'no-mode|m' => \my $ignore_mode, 'quick|q' => \my $skip_contents, ); use vars qw(%connections); sub run_remote { my( $server, $command )= @_; my $user; if( $server =~ /(.*)\@(.*)/ ) { $user = $1; $server = $2; }; if( ! $connections{ $server }) { my $ssh2 = Net::SSH2->new(); $ssh2->connect($server) or die "Couldn't connect to '$server': + $!"; if ($ssh2->auth( username => $user, interact => 1 )) { $connections{ $server } = $ssh2; } else { die "No auth to $server."; }; }; my $fh = $connections{ $server }->channel; warn "[$command]" if $verbose; $fh->exec($command) or die; my @lines = map {s/\s+$//; $_ } <$fh>; #warn "$server:[$_]" for @lines; return @lines } sub get_local { my( $file )= @_; open my $fh, '<', $file or warn "Couldn't read '$file': $!"; binmode $fh; my @lines = map {s/\s+$//; $_ } <$fh>; return @lines } sub split_serverpath { my( $serverpath ) = @_; if( $serverpath =~ /((?:\w+\@)[\w.]+):(.*)/ ) { return ("$1","$2"); } else { # Must be local return (undef, $serverpath); } }; use Data::Dumper; sub filelist { my( $serverpath ) = @_; my( $host, $dir ) = split_serverpath( $serverpath ); if( $host ) { # Outputs a line per file # mode user group type filename my $uid_gid_file = q!perl -Mstrict -MFile::stat -nle 'next if +/^\s*$/;my $s=stat($_);my($p,$u,$g,$t)=(0,q(-),q(-),q(f)); if($s and +not -l) {$p=$s->mode;$u=(getpwuid($s->uid))[0];$g=(getgrgid($s->gid)) +[0] } else { $t=q(l)}; print sprintf qq(%08o %s %s %s %s), $p, $u,$g, +$t,$_'!; # Read all directory entries my @remote_entries = map { my( $mode,$u,$g,$t,$name ) = split +/ +/, $_, 6 ; $name =~ s!^\Q$dir!!; { user => $u, group => $g, type => +$t, name => $name, mode => $mode }; } run_remote( $host, qq{find '$dir' - +type f -o -type l| $uid_gid_file } ); return @remote_entries; } else{ my @files; find({ wanted => sub { return if -d $_; my $s = stat($_) or warn "Couldn't stat [$_]: $!", return; my $name = $_; my $u='-'; my $g='-'; my $t='f'; my $mode = $s->mode; $name =~ s!^\Q$dir!!; push @files, { user => $u, group => $g, type => $t, name = +> $name, mode => $mode }; }, no_chdir => 1 }, $dir ); #warn "local: $_" for @files; return @files; }; } sub wanted_file { my( $fileinfo )= @_; my $file = $fileinfo->{name}; if( my @why = grep { $file =~ /\Q$_/ } @ignore ) { #warn "Ignoring $file (@why)"; } else { #warn "Allowing [$file] ..."; } ! grep { $file =~ /\Q$_/ } @ignore; } sub diff { my( $name, $server1, $server2 )= @_; my($host1, $path1) = split_serverpath( $server1 ); my($host2, $path2) = split_serverpath( $server2 ); my @left = $host1 ? run_remote( $host1, qq{cat '$path1$name'} ) : + get_local( "$server1$name" ); my @right = $host2 ? run_remote( $host2, qq{cat '$path2$name'} ) : + get_local( "$server2$name" ); my $diff = Algorithm::Diff->new( \@left, \@right ); $diff->Base( 1 ); # Return line numbers, not indices my $has_diff; while( $diff->Next() ) { next if $diff->Same(); if( ! $has_diff ) { $has_diff = 1; print "$name\n"; }; my $sep = ''; if( ! $diff->Items(2) ) { printf "%d,%dd%d\n", $diff->Get(qw( Min1 Max1 Max2 )); } elsif( ! $diff->Items(1) ) { printf "%da%d,%d\n", $diff->Get(qw( Max1 Min2 Max2 )); } else { $sep = "---\n"; printf "%d,%dc%d,%d\n", $diff->Get(qw( Min1 Max1 Min2 Max2 )); } print "< $_\n" for $diff->Items(1); print $sep; print "> $_\n" for $diff->Items(2); } $has_diff }; my( $server1, $server2 )= @ARGV; #warn "Old: $server1"; #warn "New: $server2"; my %left_info = map { $_->{name} => $_ } grep { wanted_file($_) } file +list( $server1 ); my %right_info = map { $_->{name} => $_ } grep { wanted_file($_) } fil +elist( $server2 ); my @left_names = sort keys %left_info; my @right_names = sort keys %right_info; my $filediff = Algorithm::Diff->new( \@left_names, \@right_names ); my @samelist; $filediff->Base( 1 ); # Return line numbers, not indices while( $filediff->Next() ) { if( $filediff->Same() ) { # entry exists in both trees push @samelist, $filediff->Items(1); } else { # Entries only on tree 2, but no symlink my @new_items = grep { ! $right_info{ $_ }->{type} ne 'l' } $f +ilediff->Items(2); print "new: $_\n" for @new_items; # Entries only on tree 1, but no symlink my @old_items = grep { ! $left_info{ $_ }->{type} ne 'l' } $fi +lediff->Items(1); print "del: $_\n" for @old_items; }; } for my $same (@samelist) { my $linfo = $left_info{ $same }; my $rinfo = $right_info{ $same }; #warn "File: $same"; #warn Dumper $linfo; #warn Dumper $rinfo; if( $linfo->{type} ne $rinfo->{type} ) { print "$same: Link vs. file: $linfo->{type} => $rinfo->{type}\ +n"; }; next if $linfo->{type} eq 'l' or $rinfo->{type} eq 'l'; if( ! $ignore_owner ) { if( $left_info{ $same }->{user} ne $right_info{ $same }->{ +user} or $left_info{ $same }->{group} ne $right_info{ $same }->{ +group} ) { print "$same: Ownership different: $left_info{ $same }->{use +r}:$left_info{$same}->{group} ne $right_info{ $same }->{user}:$right_ +info{$same}->{group}\n"; }; }; if( ! $ignore_mode ) { if( $left_info{ $same }->{mode} ne $right_info{ $same }->{m +ode} ) { print "$same: Mode different: $left_info{ $same }->{mode} $r +ight_info{$same}->{mode}\n"; }; }; if( ! $skip_contents ) { diff( $same, $server1, $server2 ); }; };

Replies are listed 'Best First'.
Re: diff-servers.pl - Diff directory trees across machines
by VinsWorldcom (Prior) on Aug 20, 2015 at 13:07 UTC

    Because my $work laptop is Windows 7 x64, I've been using Strawberry for years now. Getting SSH to work has been tedious to say the least until I too discovered Net::SSH2 was bundled with Strawberry.

    I'm normally interfacing with switches / routers, but when I next dealing with servers, I'll give this script a run.

Re: diff-servers.pl - Diff directory trees across machines
by stevieb (Canon) on Aug 24, 2015 at 14:02 UTC

    After a quick glance, I'd move the use Data::Dumper; from the middle of the script to the top of it. That keeps all of your use statements together so it's easy to see exactly what you're using at a glance, and also if you ever lose the requirement of the Dumper statements, you may forget to remove the use statement (because it is buried) which will load the module unnecessarily.

    One other thing is that I'd do a check to ensure the correct number of args are passed in. Change this:

    my( $server1, $server2 )= @ARGV;

    ...to something like this:

    die "Usage: script.pl server1 server2" if @ARGV != 2; my ($server1, $server2) = @ARGV;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://1139276]
Approved by marto
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-19 03:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found