knowing when a file is done writing to the server?

flieckster has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: knowing when a file is done writing to the server? by afoken (Chancellor) on Apr 05, 2017 at 04:39 UTC
i have a few scripts that move large batches of files on a local server, i have no problem moving them quickly, the issue i have is the user who is uploading has a much slower connection then me. how do i know when to move a file? if they are uploading 200 images, whats a good practice to know when any of the files is done writing to the server? Start by repairing your shift key. It seems to have severe contact problems. Consider replacing your keyboard. Then, don't let your users upload to the final destination with the final filename. Upload to a temp file in a temp directory on the same disk, then rename to the final location and the final name. This way, the upload can take ages without affecting the scripts that move completed files. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re: knowing when a file is done writing to the server? by Your Mother (Archbishop) on Apr 05, 2017 at 00:42 UTC
Maybe Linux::Inotify2, IO::KQueue, or similar. They are platform dependent.	[reply]
Re: knowing when a file is done writing to the server? by marinersk (Priest) on Apr 07, 2017 at 08:35 UTC
It sounds like the upload process is a human action, such as SFTP or a browser upload. If the latter, you might be able to mask the underlying enginery. For example, if the technique you use is the temp upload approach, if the upload is happening with a CGI script or similar process over which you have complete control, you can simply do the move step (see below) after the upload is complete inside the CGI script. This becomes a human manual step if they're uploading using something like SFTP, which may or may not be culturally palatable, and is therefore a design decision. Enjoy. So -- the usual solutions: Flag file (noted above) Temp upload location and rename/move (noted above) Track file size over time (complex, prone to failure, assumptive, last resort) Flag File The flag file technique is more often used in automated systems. It goes something like this: Sender uploads file (i.e., `abc.dat`) Sender uploads a tiny flag file (i.e., `abc.dat.ok`) Mover loop scans for flag files `.ok` and finds this new one Mover renames flag file `abc.dat.ok`to `abc.dat.wip`to show it's being worked on Mover moves file `abc.dat` Mover removes file `abc.dat` Mover removes flag file `abc.dat.wip` Renaming the flag file to show current status has the added advantage of permitting multiple worker threads to work on the same repository of files needing processing, since the rename operation is likely atomic (eliminates single-step race condition), and if not atomic, is at least very fast (low risk of race condition). Robust craftsmanship can fine-tune this process for efficiency and collision reduction, such as globbing `.ok`but rechecking each flag file's existence before attempting the processing of it. Temp Upload The temp upload technique is more often used in automated systems. It goes something like this: Sender uploads file to temporary location (i.e., `tempul/abc.dat`) Sender moves file to repository location (i.e., `rename tempul/abc.dat ./abc.dat`) Mover loop scans for files in the repository and finds this new one Mover moves file `abc.dat` Mover removes file `abc.dat` As described, this assumes a single worker process for the mover; the mover process can use flag files or some other technique to achieve the same multi-worker-safe environment as the Flag File technique. Track File Size The file size tracking technique is fraught with assumptions and difficulties, but if you cannot gracefully implement another solution, it's a possible improvement over leaving everything to chance. It goes something like this: Mover process is configured to give each file a certain amount of time before it is presumed complete (i.e., 5 minutes with no change in file size) Sender uploads file (i.e., `abc.dat`) Mover loop scans for files in the repository and records file sizes; if the size has changed, record the current timestamp Mover eventually notices the file size hasn't changed for the previously noted configuration time (e.g., 5 minutes), and thus presumes the file upload is complete, and moves the file This process is dependent upon a dynamic file size being properly reported from the OS; I've seen some environments where the reported size of the file is the full size as soon as the file is opened, which obviously renders this technique impotent. As described, this assumes a single worker process for the mover; however, so long as a failure in the move operation does not cause the script to die, it is likely a fairly safe construct for the multiple worker process environment.	[reply] [d/l] [select]
Re: knowing when a file is done writing to the server? by Anonymous Monk on Apr 05, 2017 at 01:19 UTC
The file goes in a temp directory until it is fully uploaded, and then the user who is uploading moves it to the final directory.	[reply]
Re^2: knowing when a file is done writing to the server? by Anonymous Monk on Apr 05, 2017 at 01:22 UTC
Maybe I shouldn't have said "final". The uploader puts it in a "completed" directory when it is done. When your server process sees that something is in the "completed" directory, it can move it wherever it needs to go after that.	[reply]
Re: knowing when a file is done writing to the server? by ablanke (Monsignor) on Apr 07, 2017 at 07:53 UTC
Hello flieckster, since it is not in your hands, i don't know if it is possible, but the uploading user could provide a checksum file for you or at least an ok-file (image-name.png.ok) after the uploading process is complete.	[reply]


Problems? Is your data what you think it is?
	PerlMonks