|Perl: the Markov chain saw|
Rsync Scriptby Anonymous Monk
|on Aug 18, 2008 at 17:24 UTC||Need Help??|
Anonymous Monk has asked for the
wisdom of the Perl Monks concerning the following question:
I put together the specs below for a script that I need. I chose Perl because I have some familiarity with it and because I think it is well suited for the task. Here is what I need:
* I have a set of folders in a remote location that I need to sync locally. The folders in the remote location have files added to them routinely throughout the day.
* I only need files synced that have been changed or that are new.
* I need to be able to set limits on what is synched. Currently I only need to sync .zip files under 250 MB. I would like these options to be set via command line.
* It needs to have an efficient copy engine as there is a great distance between the sites. Something similar to rsync.
* I need it to keep a log of what has been synchronized.
* -Very Important- I either need the application to be multi-threaded so that it can synch more than one file at a time or I need to be able to run more than one copy of the application at the same time. Here is why. Say I have the script set to run every 15 minutes. If a user adds a 240 MB file, it may take longer than 15 minutes to sync the file. When the next run starts, I need the script to notice that the first file is already being updated and move on to the second file (by the way, some files are relatively small), thus I need to have two copies or threads (sometimes a few more) running at the same time.
My first thought was rsync, but I ran into a problem with it because it doesn't do file locking. When I use rsync, it comes to a large file that needs to be synchronized and it starts to work. The next time rsync runs, it comes to the same file that is not yet done and again tries to sync it. There is also a problem in that the first run of rsync then continues on to the next file that needed updating when it last checked, but other runs may have already updated that same file.
So, I was thinking that a script could call rsync using a system call and rsync's -i option (itemize changes) that would be written out to a text file. The script would then look at the text file and create a single rsync instance per file to update. The script would have to keep track of which files are still being updated so as not to attempt to update them again unless the first update failed. It would need to be able to set the number of threads that the script uses, again hopefully on the command line. By the way, this script would currently need to run using rsync and cygwin on Windows (currently working) but it would probably need to run on Linux in the future as it would need to run at different locations.
The problem is that this is all a bit over my head as I have never done anything quite that elaborate. So I have three questions:
1.) Does this sound do-able or is there a better way?
2.) Has anyone seen a script that is similar to this that I might use to get started?
3.) Are there any Perl hackers out there that may wish to take on a small project like this for a few bucks?
Thank you all.