It sounds like the upload process is a human action, such as SFTP or a browser upload. If the latter, you might be able to mask the underlying enginery.
For example, if the technique you use is the temp upload approach, if the upload is happening with a CGI script or similar process over which you have complete control, you can simply do the move step (see below) after the upload is complete inside the CGI script.
This becomes a human manual step if they're uploading using something like SFTP, which may or may not be culturally palatable, and is therefore a design decision. Enjoy.
So -- the usual solutions:
- Flag file (noted above)
- Temp upload location and rename/move (noted above)
- Track file size over time (complex, prone to failure, assumptive, last resort)
Flag File
The flag file technique is more often used in automated systems. It goes something like this:
- Sender uploads file (i.e., abc.dat)
- Sender uploads a tiny flag file (i.e., abc.dat.ok)
- Mover loop scans for flag files *.ok and finds this new one
- Mover renames flag file abc.dat.okto abc.dat.wipto show it's being worked on
- Mover moves file abc.dat
- Mover removes file abc.dat
- Mover removes flag file abc.dat.wip
Renaming the flag file to show current status has the added advantage of permitting multiple worker threads to work on the same repository of files needing processing, since the rename operation is likely atomic (eliminates single-step race condition), and if not atomic, is at least very fast (low risk of race condition).
Robust craftsmanship can fine-tune this process for efficiency and collision reduction, such as globbing *.okbut rechecking each flag file's existence before attempting the processing of it.
Temp Upload
The temp upload technique is more often used in automated systems. It goes something like this:
- Sender uploads file to temporary location (i.e., tempul/abc.dat)
- Sender moves file to repository location (i.e., rename tempul/abc.dat ./abc.dat)
- Mover loop scans for files in the repository and finds this new one
- Mover moves file abc.dat
- Mover removes file abc.dat
As described, this assumes a single worker process for the mover; the mover process can use flag files or some other technique to achieve the same multi-worker-safe environment as the Flag File technique.
Track File Size
The file size tracking technique is fraught with assumptions and difficulties, but if you cannot gracefully implement another solution, it's a possible improvement over leaving everything to chance. It goes something like this:
- Mover process is configured to give each file a certain amount of time before it is presumed complete (i.e., 5 minutes with no change in file size)
- Sender uploads file (i.e., abc.dat)
- Mover loop scans for files in the repository and records file sizes; if the size has changed, record the current timestamp
- Mover eventually notices the file size hasn't changed for the previously noted configuration time (e.g., 5 minutes), and thus presumes the file upload is complete, and moves the file
This process is dependent upon a dynamic file size being properly reported from the OS; I've seen some environments where the reported size of the file is the full size as soon as the file is opened, which obviously renders this technique impotent.
As described, this assumes a single worker process for the mover; however, so long as a failure in the move operation does not cause the script to die, it is likely a fairly safe construct for the multiple worker process environment.