Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

SFTP Sync

by neosamuri (Friar)
on Mar 26, 2006 at 04:47 UTC ( #539261=sourcecode: print w/replies, xml ) Need Help??
Category: FTP Stuff
Author/Contact Info neosamuri ie. me
Description: perl sync.pl local_path remote_host user pass

This will compare the remote and local folders, and copy the newer of each to the other. This results in each machine having the newest of each and being the same

#!/usr/bin/perl -w
use strict;
use warnings;
use Net::SFTP;
use Net::SFTP::Attributes;
unless( @ARGV == 5) {
   print <<END;
Proper Usage:
    perl sync local_path remote path user pass
END
   exit(0);
}
my($local_path, $host, $remote_path, $user, $pass) = @ARGV;

my(%args) = (    'user' => $user,
'password' => $pass
        );
my $sftp = Net::SFTP->new($host, %args) or die "pasta\n";
die "SFTP Connection Failed\n" if $sftp->status;
#read_local_dir("$local_path/");
#print "----\n";
#$sftp->ls("/");
#read_remote_dir($sftp,"$remote_path/");
sync($sftp, $local_path, $remote_path, "");
sub sync {
   my $sftp = shift;
   my $local = shift;
   my $remote = shift;
   my $path = shift;
   my( $loc_file, $loc_dir) = read_local_dir("$local/$path");
   my( $rem_file, $rem_dir) = read_remote_dir($sftp, "$remote/$path");
   while( my($lk,$lv) = each %$loc_file ) {
      my $flag = 0;
      while( my($rk,$rv) = each %$rem_file ) {
     if($lk eq $rk) {
        if($lv > $rv) {
           # put $lk
           $sftp->put("$local/$path/$lk","$remote/$path/$lk");
        } elsif($lv < $rv) {
           # get $rk
           $sftp->get("$remote/$path/$lk", "$local/$path/$lk");
        }
            $flag = 1;
     }
      }
      if( $flag == 0 ) {
         # put $lk
     $sftp->put("$local/$path/$lk","$remote/$path/$lk");
      }
   }
   while( my($rk,$rv) = each %$rem_file ) {
      my $flag = 0;
      while( my($lk,$lv) = each %$loc_file ) {
     if( $lk eq $rk ) {
        $flag = 1;
     }
      }
      if( $flag == 0 ) {
         # get $rk
     $sftp->get("$remote/$path/$rk", "$local/$path/$rk");
      }
   }
   while( my($lk,$lv) = each %$loc_dir ) {
      my $flag = 0;
      next if $lk =~ /^\./;
      while( my($rk,$rv) = each %$rem_dir ) {
     if( $lk eq $rk ) {
        sync( $sftp, $local, $remote, "$path/$lk");
        $flag = 1;
     }
      }
      if( $flag == 0 ) {
         # create dir
     my $att = Net::SFTP::Attributes->new(Stat => [stat "$local/$path/
+$lk"]);
     $sftp->do_mkdir( "$remote/$path/$lk" , $att);
         # sync it
     sync( $sftp, $local, $remote, "$path/$lk" );
      }
   }
   while( my($rk,$rv) = each %$rem_dir ) {
      my $flag = 0;
      next if $rk =~ /^\./;
      while( my($lk,$lv) = each %$loc_dir ) {
     if( $lk eq $rk ) {
        $flag = 1;
     }
      }
      if( $flag == 0 ) {
         # create dir
     mkdir("$local/$path/$rk");
         # sync it
     sync( $sftp, $local, $remote, "$path/$rk" );
      }
   }
}

sub read_local_dir {
   my $path = shift;
   my $file = {};
   my $dir = {};
   my $f;
   opendir DIR, $path or
      die "Failed to open $path: $!\n";
   while($f = readdir DIR) {
      my(@stats) = stat "$path/$f";
      if(-d "$path/$f"){
     $dir->{$f} = $stats[9];
      } else {
     $file->{$f} = $stats[9];
      }
   }
   return ($file, $dir);
}
sub read_remote_dir {
   my $sftp = shift;
   my $path = shift;
   my $file = {};
   my $dir = {};
   $sftp->ls( $path , sub {
      my $ref = shift;
      if( $ref->{'longname'} =~ /^d/ ) {
     $dir->{$ref->{'filename'}} =$ref->{'a'}->mtime(); 
      } else {
     $file->{$ref->{'filename'}} =$ref->{'a'}->mtime(); 
      }
      
   });
   return ($file, $dir);
}

Replies are listed 'Best First'.
Re: SFTP Sync
by graff (Chancellor) on Mar 26, 2006 at 06:42 UTC
    I think it's nice to have this sort of bi-directional sync; I'm sure it's handy, and I'm not familiar enough with the standard tools like "rsync" to know whether they support bi-directional action easily.

    You might get into trouble if you're not careful about the dates associated with files that you "get" and "put". I just tried a command-line sftp transfer from a freebsd server to my macosx laptop, and I found that the modification date on my local copy reflected the time of the transfer -- not the original modification date of the file that was on the remote server.

    This means that if you repeat your bi-directional sync on the same paths of the same local and remote hosts, every file on the local box that originated from a "get" will be newer than the one on the remote box that it was copied from, even though the content did not change. So on the second run, that same file will have to be copied back from the local to the remote, because the remote copy is "older"; and likewise, in reciprocal fashion, for any files that were "put" onto the remote box during the first run. In other words, on each successive run over the same pair of hosts and paths, every file that has ever been copied on a previous run will be copied again in the alternate direction.

    So, if your typical usage involves running this script at intervals on the same directories between the same hosts, you'll probably want to tabulate checksums for the data files, and base your decision about what to "get" and "put" on wether files with the same name have different content, as well as different dates. Either that, or else take the trouble on both sides to actually set the dates of the files that get transfered, so that the dates will match the next time you run the sync job (unless of course someone changes the file content on one of the hosts between runs). My command-line sftp man page doesn't seem to have any option for controlling how the mod.date is set for files being transferred; I see that Net::SFTP has a "setstat" function, but it would take more probing to figure out how that could be used to set the mod.date.

    (Depending on your situation, you might also want to check file sizes; if a remote file is a month old and 30 MB, would you want to replace that with a local file that was "updated" around the time of your reboot last week, and is 0 bytes? For cases where you are replacing a file, it might be prudent to preserve the file that is being replaced by renaming it to something else, e.g. "some.file" becomes "some.file.replaced-on-20060331".)

    Apart from that, I wondered about your use of this logic:

    while( my($lk,$lv) = each %$loc_file ) { my $flag = 0; while( my($rk,$rv) = each %$rem_file ) { if($lk eq $rk) { # .. do a "get" or "put" if necessary $flag = 1; } } if ( $flag == 0 ) { .. # } } while( my($rk,$rv) = each %$rem_file ) { my $flag = 0; while( my($lk,$lv) = each %$loc_file ) { # same as above, but in other direction... } }
    I hope no one will complain about "premature optimization" if I suggest that it would be more efficient like this:
    while ( my ($lk, $lv) = each %$loc_file ) { if ( !exists( $$rem_file{$lk} or $$rem_file{$lk < $lv ) { # do a put... } elsif ( $lv < $$rem_file{$lk} ) { # do a get... } delete $$rem_file{$lk}; # harmless if it's non-existent } # at this point, the only elements remaining in %$rem_file # are files that do not exist on local host, so copy them all: for my $rk ( keys %$rem_file ) { # do a get... }
    That covers both directions, just like your version. Two points about this approach:

    (a) When you read it, it's easier to grasp what is being done -- the logic is a direct transliteration of "if no remote file matches local file, or remote file is older, put local to remote; otherwise, if local file is older, do a get; then get any remote files not present on the local host.

    (b) When it executes, it does a lot fewer iterations over the hash elements (N+(M-N) instead of N*M*2).

    And similarly for the handling of directories. Last little nit-pick: I'd love it if you fixed your indentation.

    update: Well, there is one more thing, which might be pretty important: if the local host happens to be a unix-like box with multiple users, you almost surely do not want users to put their login name and password for the remote host on the command line of a local shell. The command line args are visible to other users via the process table (see the unix "ps" command).

Re: SFTP Sync
by perrin (Chancellor) on Mar 26, 2006 at 20:44 UTC
    Not a criticism of your code, but it would be more efficient to use rsync for this. It doesn't need to move the entire file if only a small portion has changed.
Re: SFTP Sync
by Anonymous Monk on Sep 11, 2008 at 06:08 UTC
    can u explain the login of the below in detail:-
    $sftp->ls( $path , sub { my $ref = shift; if( $ref->{'longname'} =~ /^d/ ) { $dir->{$ref->{'filename'}} =$ref->{'a'}->mtime(); } else { $file->{$ref->{'filename'}} =$ref->{'a'}->mtime(); } }
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://539261]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2017-12-16 11:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (450 votes). Check out past polls.

    Notices?