Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: knowing when a file is done writing to the server?

by marinersk (Priest)
on Apr 07, 2017 at 08:35 UTC ( [id://1187378]=note: print w/replies, xml ) Need Help??


in reply to knowing when a file is done writing to the server?

It sounds like the upload process is a human action, such as SFTP or a browser upload. If the latter, you might be able to mask the underlying enginery.

For example, if the technique you use is the temp upload approach, if the upload is happening with a CGI script or similar process over which you have complete control, you can simply do the move step (see below) after the upload is complete inside the CGI script.

This becomes a human manual step if they're uploading using something like SFTP, which may or may not be culturally palatable, and is therefore a design decision. Enjoy.

So -- the usual solutions:

  • Flag file (noted above)
  • Temp upload location and rename/move (noted above)
  • Track file size over time (complex, prone to failure, assumptive, last resort)

Flag File

The flag file technique is more often used in automated systems. It goes something like this:

  1. Sender uploads file (i.e., abc.dat)
  2. Sender uploads a tiny flag file (i.e., abc.dat.ok)
  3. Mover loop scans for flag files *.ok and finds this new one
  4. Mover renames flag file abc.dat.okto abc.dat.wipto show it's being worked on
  5. Mover moves file abc.dat
  6. Mover removes file abc.dat
  7. Mover removes flag file abc.dat.wip

Renaming the flag file to show current status has the added advantage of permitting multiple worker threads to work on the same repository of files needing processing, since the rename operation is likely atomic (eliminates single-step race condition), and if not atomic, is at least very fast (low risk of race condition).

Robust craftsmanship can fine-tune this process for efficiency and collision reduction, such as globbing *.okbut rechecking each flag file's existence before attempting the processing of it.

Temp Upload

The temp upload technique is more often used in automated systems. It goes something like this:

  1. Sender uploads file to temporary location (i.e., tempul/abc.dat)
  2. Sender moves file to repository location (i.e., rename tempul/abc.dat ./abc.dat)
  3. Mover loop scans for files in the repository and finds this new one
  4. Mover moves file abc.dat
  5. Mover removes file abc.dat

As described, this assumes a single worker process for the mover; the mover process can use flag files or some other technique to achieve the same multi-worker-safe environment as the Flag File technique.

Track File Size

The file size tracking technique is fraught with assumptions and difficulties, but if you cannot gracefully implement another solution, it's a possible improvement over leaving everything to chance. It goes something like this:

  1. Mover process is configured to give each file a certain amount of time before it is presumed complete (i.e., 5 minutes with no change in file size)
  2. Sender uploads file (i.e., abc.dat)
  3. Mover loop scans for files in the repository and records file sizes; if the size has changed, record the current timestamp
  4. Mover eventually notices the file size hasn't changed for the previously noted configuration time (e.g., 5 minutes), and thus presumes the file upload is complete, and moves the file

This process is dependent upon a dynamic file size being properly reported from the OS; I've seen some environments where the reported size of the file is the full size as soon as the file is opened, which obviously renders this technique impotent.

As described, this assumes a single worker process for the mover; however, so long as a failure in the move operation does not cause the script to die, it is likely a fairly safe construct for the multiple worker process environment.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1187378]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-19 02:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found