Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Continuously polling multiple directories for file transfer?

by osunderdog (Deacon)
on Feb 10, 2009 at 13:11 UTC ( #742758=note: print w/replies, xml ) Need Help??


in reply to Continuously polling multiple directories for file transfer?

Just sharing an issue I've run across in this domain.

There is a difference between a new file appearing on disk and when the file has been filled and is whole. For example if a file is FTP'd to a directory there is a period of time where the new file exists, however it has a zero byte size or partial byte size.

There are various ways to get around this depending on your circumstances, but it's dangerous to assume that the file system performs atomic operations on disk.

Still looking. Still searching.

  • Comment on Re: Continuously polling multiple directories for file transfer?

Replies are listed 'Best First'.
Re^2: Continuously polling multiple directories for file transfer?
by MidLifeXis (Monsignor) on Feb 10, 2009 at 15:40 UTC

    I am using a possibly outdated FTP RFC. See the commands STOU, RNFR, and RNTO.

    Along these lines, it might be better to try to STOU the file into a unique file name, and then rename the file once it has been completely uploaded. This is similar to the techniques used under *nix to try to assure atomicity. If you create the file under the same name as what it needs to be, and your ftpd does not do stuff behind the scenes to ensure that when a file is added to the file system it is complete, you will have this race condition.

    it's dangerous to assume that the file system performs atomic operations on disk.

    Under POSIX, I believe that the atomic semantics are "required", but under Windows, this may not be the case. That is, however, not saying that RNFR and RNTO require the use of the rename() POSIX semantics.

    --MidLifeXis

Re^2: Continuously polling multiple directories for file transfer?
by foobie (Initiate) on Feb 11, 2009 at 11:17 UTC
    We found that for xml files you can check if the file parses OK - if it's mid-download it will have unclosed tags and parsing will fail. This is assuming you have a single root node per file.
      Or you can upload a second file once the first one has completed, eg filename.complete - when that the second one appears you can assume the first one is complete. Dunno how truly atomic it is, but we had no problems on a live system handling thousands of uploads/day over several years.
        Thank you for the reply foobie. Are you suggesting something like the following, where I ls, copy the each file to the new appended filename, process each appended file, then move like this?

        #!/usr/bin/perl -w use strict; use warnings; use diagnostics; use File::Copy; $path = "/mnt/ldmdata/"; @site_array = ("karx", "kdlh", "kfsd", "kmpx", "kmvx", "kwbc"); $poll_time = 20; # sec between polls of all specified directories for (;;) { foreach $site (@site_array) { $file_dir = $path . $site; $archive_dir = $file_dir . "/archive"; mkdir "$archive_dir", 0755 unless -d "$archive_dir"; opendir(FILE, $file_dir) || die "Cannot open $file_dir"; @files = readdir(FILE); closedir(FILE); if(@files) { foreach $file (@files) { copy($file_dir . $file, $file_dir . $file . ".complete +"); pqinsert $file_dir . $file . ".complete"; move($file_dir . $file . ".complete", $archive_dir . $ +file . ".complete"); unlink($file_dir . $file); } } } sleep $poll_time; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://742758]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2020-12-01 09:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?