Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Copying a large file (6Gigs) across the network and deleting it at source location

by skyler (Beadle)
on Feb 04, 2004 at 22:54 UTC ( #326629=perlquestion: print w/replies, xml ) Need Help??

skyler has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have script that copies a large file across the network to a storage device. But I'm about to incorporate a delete procedure to erase the file from its source location once it finishes copying to destination location. It takes about one hour and half to copy across the network. I wish that it could be another way to acomplish this task. Should I include a timer or is there a function of copying until EOF to delete the file once it finished copying? I appreciate your suggestions.
#!/perl use warnings; use File::Copy; use File::Find; find(\&wanted, 'M:\\(Directory)'); sub wanted { if ($File::Find::name =~ /\.BAK$/) { my $copyname = "I:\\(Directory0)\\(Directory1)\\(Directory2)/$ +_"; print "Copying $_ from $File::Find::dir to $copyname\n"; copy("$File::Find::name","$copyname"); } } $dir = "M:\\(Directory)"; print "Starting Delete Process\n\n"; opendir(DIR, "$dir") || die "No $dir: $!"; @files = grep(!/^\./, readdir(DIR)); @files = sort @files; closedir(DIR); foreach (@files) { print $_, "\n"; if (/\.BAK$/i) { unlink ("$dir/$_") if (-f "$dir/$_") || print "Unable to delete $_: $!"; } }
  • Comment on Copying a large file (6Gigs) across the network and deleting it at source location
  • Download Code

Replies are listed 'Best First'.
Re: Copying a large file (6Gigs) across the network and deleting it at source location
by Abigail-II (Bishop) on Feb 04, 2004 at 23:18 UTC
    Copying a file of 6Gb means you have to write 6Gb of data. That's going to take a long time. Instead of copying and removing, people tend to 'move' a file instead. That's fast when it's on the same filesystem, and on most modern OSses, it falls back to copy and delete if the data has to moved to a different filesystem.

    But I don't really understand your question. You can't really speed up the process - at least not by using different statements in your program (you might be able to tune your OS that copying huge files goes faster). I don't know why you are considering a timer, and I've no idea what you mean by "copying until EOF to delete the file once it finished copying".

    I would do the thing you're doing from the command line, and skip the Perl part:

    find M:/Directory -name '*.BAK' \ -exec mv {} 'I:/(Directory0)/(Directory1)/(Directory2)/{}' \;

    Abigail

Re: Copying a large file (6Gigs) across the network and deleting it at source location
by allolex (Curate) on Feb 04, 2004 at 23:15 UTC

    Sorry for stating this so flatly, but you should really do a checksum on the original and compare it to your copy before unlink()ing the orginal. Have a look at Digest::MD5, which seems to be very popular. You could also call the *NIX command 'cksum' (which probably has been ported to Windows, or at least has an equivalent) and get very similar results.

    Also, when handling errors, try 'die' instead of 'print', so Perl will return the right error level: unlink( "$_" ) or die "Couldn't delete file: $_\n"; If you add an 'or die' to the copy operation, it will only attempt to delete the orginal if the copy is successful.

    --
    Allolex

      Doing a checksum will effectively double the transfer time because the files need to be read back from the remote location. Especially since a network filesystem copy is fairly reliable since the OS does some error recovery.

        Well, yes. That's a good point. But no one said the checksum has to be done from the remote system. ;)

        --
        Allolex

Re: Copying a large file (6Gigs) across the network and deleting it at source location
by Roger (Parson) on Feb 05, 2004 at 00:40 UTC
    Just an idea, how about using secure copy to copy files across the network recursively instead?
    scp -CBvrp srcpath user@host:destpath ||||| ||||+-- Preserves time stamp |||+--- Recursively copy entire directories. ||+---- Verbose mode useful for logging |+----- Batch mode +------ Enable compression accross network and then... del /S *.BAK # recursively delete BAK files
      This assumes that the extra CPU juice needed to encrypt/decrypt and compress/decompress 6gb will allow the transfer to happen at the same rate or faster. It may or may not, but I assume if his network can't transfer raw 6gb faster than he has stated, the CPU's (and memory) plugged into such a network may be an even tighter bottleneck.


      -Waswas
Re: Copying a large file (6Gigs) across the network and deleting it at source location
by dws (Chancellor) on Feb 05, 2004 at 05:10 UTC
    It takes about one hour and half to copy across the network.

    You don't mention how far apart the source and destination are, or whether they're attended by people (as opposed to running in a dark colo somewhere). If the servers are close, there's an option we often forget: use removable drives. It take considerably less than an hour and a half to copy 6Gb of data at IDE (or SCSI) speeds, remove the drive, and walk it across the room to the backup box.

    Or, if the source and destination are far apart and "latency" isn't critical, shipping a removable drive via FedEx can still yield reasonable bandwidth.

    It might be an option to consider.

Re: Copying a large file (6Gigs) across the network and deleting it at source location
by ctilmes (Vicar) on Feb 05, 2004 at 11:59 UTC
    rsync has a "--delete-after" option to delete the file after the copy.

    Depending on the nature of your file (does every single byte change every time you need to copy it?) rsync can also improve efficiency. It can also use ssh compression similar to the scp already mentioned if that helps your transfer.

    (Or it could be much less efficient...in which case don't use it.)

Re: Copying a large file (6Gigs) across the network and deleting it at source location
by zentara (Archbishop) on Feb 05, 2004 at 15:32 UTC
    On a file that big, I would be tempted to split the file on the remote machine into say 60 pieces of 100 meg each(or even 600 10meg files). Take md5sums on all the pieces, and send the list to your local machine. Then download them 1 at a time(or even a few in parallel if your bandwidth permits), and as they arrive, if their md5sum match, then delete that cut portion off of the remote machine. After all the files have arrived and are verified, cat them back together.

    Do alot of testing of this method first. :-) But it would give you some protection against one of the network connection hanging, causing loss of a partial file. It may also speed up your transfer, with parallel file transfers.

Re: Copying a large file (6Gigs) across the network and deleting it at source location
by rchiav (Deacon) on Feb 05, 2004 at 15:51 UTC
    robocopy is well suited for this. I've used it for a lot of large (read: 20-50 gig) data migrations and it's worked fairly well. It has a switch to retry on errors, and it can recover from network glitches. You can also copy security info. It's in the NT resource kit, and I believe it comes with XP.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://326629]
Approved by allolex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (1)
As of 2021-09-21 20:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?