http://www.perlmonks.org?node_id=1014426


in reply to Re^4: Finding and sorting files in massive file directory
in thread Finding and sorting files in massive file directory

If there is a process adding new files to the directory while your "tar up" script is running, then you need to face the twin issues of deleting files which haven't been put in the tar file, and putting empty or half-written files into the tarball. If possible, you need to be able to stop the process from adding any new files while the script is running; but if you can't, then the following should be safe.

Use the script I gave you above to, for example, move all files starting with 'b' into a b/ subdirectory. Then wait a few minutes, or however long it could reasonably take for the process to finish writing the current file, then from the command line, simply:

$ tar -cfz .../some-path/b.tar.gz b/ $ tar -tfz .../some-path/b.tar.gz > /tmp/foo View /tmp/foo in a text editor to see if it looks reasonable, then $ rm -rf b/
If the rm fails due to too many files, then write another perl script similar to the one above, but using 'unlink' to remove each file one by one.

Dave.

Replies are listed 'Best First'.
Re^6: Finding and sorting files in massive file directory
by salva (Canon) on Jan 21, 2013 at 11:52 UTC
    Also, on Linux the inotify interface provides a way to discover when new files are created/open and later closed.
Re^6: Finding and sorting files in massive file directory
by CColin (Scribe) on Jan 21, 2013 at 12:58 UTC
    Thanks, I'll try it.

    I am intrigued by your earlier reference to Archive::Tar. I did look at the module briefly but it seemed rather complicated. What would it add over and above using readdir with while and basic unix commands?

      What would it add over and above using readdir with while and basic unix commands
      It allows you to programmatically to select, add files etc, without having to fork out to an external program. For example, if you didn't need to delete the files afterwards, you could have just used my original script, with $tar->add($file) for each matching file. No need to fork out, no command line length limitations etc.

      Dave.