Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Rsync Script

by Anonymous Monk
on Aug 18, 2008 at 17:24 UTC ( #704993=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I put together the specs below for a script that I need. I chose Perl because I have some familiarity with it and because I think it is well suited for the task. Here is what I need:

* I have a set of folders in a remote location that I need to sync locally. The folders in the remote location have files added to them routinely throughout the day.

* I only need files synced that have been changed or that are new.

* I need to be able to set limits on what is synched. Currently I only need to sync .zip files under 250 MB. I would like these options to be set via command line.

* It needs to have an efficient copy engine as there is a great distance between the sites. Something similar to rsync.

* I need it to keep a log of what has been synchronized.

* -Very Important- I either need the application to be multi-threaded so that it can synch more than one file at a time or I need to be able to run more than one copy of the application at the same time. Here is why. Say I have the script set to run every 15 minutes. If a user adds a 240 MB file, it may take longer than 15 minutes to sync the file. When the next run starts, I need the script to notice that the first file is already being updated and move on to the second file (by the way, some files are relatively small), thus I need to have two copies or threads (sometimes a few more) running at the same time.

My first thought was rsync, but I ran into a problem with it because it doesn't do file locking. When I use rsync, it comes to a large file that needs to be synchronized and it starts to work. The next time rsync runs, it comes to the same file that is not yet done and again tries to sync it. There is also a problem in that the first run of rsync then continues on to the next file that needed updating when it last checked, but other runs may have already updated that same file.

So, I was thinking that a script could call rsync using a system call and rsync's -i option (itemize changes) that would be written out to a text file. The script would then look at the text file and create a single rsync instance per file to update. The script would have to keep track of which files are still being updated so as not to attempt to update them again unless the first update failed. It would need to be able to set the number of threads that the script uses, again hopefully on the command line. By the way, this script would currently need to run using rsync and cygwin on Windows (currently working) but it would probably need to run on Linux in the future as it would need to run at different locations.

The problem is that this is all a bit over my head as I have never done anything quite that elaborate. So I have three questions:

1.) Does this sound do-able or is there a better way?
2.) Has anyone seen a script that is similar to this that I might use to get started?
3.) Are there any Perl hackers out there that may wish to take on a small project like this for a few bucks?

Thank you all.

Comment on Rsync Script
Re: Rsync Script
by broomduster (Priest) on Aug 18, 2008 at 17:30 UTC
    File::Rsync will do what you want. My experience with it is for relatively light and infrequent use, but it should do what you are asking without much fuss.

    Update: The script that uses File::Rsync can handle the other book keeping chores you need (e.g., to avoid attempting to sync something that is already being sync'd).

      File::Rsync will do what you want.

      How? From the link you provide I get the impression File::Rsync just gives you an interface to rsync. The OP already knows rsync does most of it what he wants, and the little extra thing he wants, File::Rsync doesn't seem to provide.

      Thank you, but I'm already having a few issues just getting started with File::Rsync. First, it won't install on Windows manually or using the CPAN shell. Second, there don't appear to be many examples of it's use.
Re: Rsync Script
by JavaFan (Canon) on Aug 18, 2008 at 18:30 UTC
    It seems that all you need is a small program that first checks if there's already a copy of itself running, and exits if this is true. And if no copy is running, you call rsync. Something like:
    use Fcntl ':all'; my $lock = "/var/lock/whatever"; sysopen my $f, $lock, O_WRONLY | O_CREAT, 0644 or die; flock $f, LOCK_EX | LOCK_NB or exit; system 'rsync', ....; __END__
      Thanks, but how will that allow for multiple copies to run at the same time or for multiple threads?
Re: Rsync Script
by dHarry (Abbot) on Aug 18, 2008 at 19:11 UTC

    If you're happy with rsync then File::Rsync as suggested above is definitely worth a try. You could check the rsync resources. There are a few Perl scripts available there including the Perl wrapper mentioned above.

    Maybe this is also a direction to look into: ftpsync. It acts as a primitive rsync clone. There are many solutions like this around.

    Hope this helps

Re: Rsync Script
by NiJo (Friar) on Aug 18, 2008 at 19:20 UTC
    Don't take requirements too serious. In many cases they are open for discussion and can be softened. In your case maybe "15 min if no big file" is all the author needs.

    I'd not worry too much about running two instances at the same time. Even if you find a proper way of locking, the line will be blocked by the big file. Effectively that is another way of serialization. Rumor: Cron does not start a new script if the old instance is still running.

    Contrary to most other unix tools rsync can do most things on its own. You should have found the --max-size option of rsync. I see little use for Perl in this project. If you really want Perl in the mix, rsnapshot might be your ultimate backup script. It uses rsync and keeps hard links between unchanged files. Basically you get full backups every time with disk space and network traffic of incrementals.

      Yeah, this was sort of my thought, too.

      If this was on *nix, I'd say look at bash to make this all work. I'm a little rusty on my Windows command-line stuff, but I'm pretty sure there's a way to see if there's an instance running. Or, of course, you could install cygwin and do it all from bash that way.
      Really, though, if you hit Sourceforge, I'm pretty sure there are a bunch of front-ends for rsync that will run on Windows.
      Might be worth a look!
Re: Rsync Script
by Illuminatus (Curate) on Aug 18, 2008 at 19:54 UTC
    Have you looked at the list method to rsync? You could use this to generate a list of files that need to be copied. I have not personally used this package, but it says that it returns a list with sizes. You could then split this list into n lists containing file totals of roughly equivalent sizes. You could then use scripts using rcp/scp to run through these lists, and have the main script harvest the children when they are finished. It is true that cron will not start a new copy of a job while the original is running, however if you set the interval to a small value, then it will still run within the next interval of when the last one finishes. Alternatively, you could use the 'at' queue, and have the job reschedule itself after completing. The code details are, of course, left as an exercise for the user :)
      Hi Monk, Did you find solution/script to get the functionality? If yes, will please share script with me?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://704993]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2014-07-23 04:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (133 votes), past polls