Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

file handling question

by smackdab (Pilgrim)
on Dec 17, 2003 at 04:03 UTC ( #315214=perlquestion: print w/ replies, xml ) Need Help??
smackdab has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have an app that writes out HTML files and am using IO::AtomicFile to write it out. I believe this will make sure users get the whole HTML file if they are refreshing the browser (not easy to test though ;-)

Is there another technique I should be using to read my config type files? I assume there can be cases where the user is doing something (other than locking it with an editor) that could cause the read to fail... (I have seen what I think is this, but again, difficult to test) Should I try to read the file, say 10 times with a .1 sec inbetween or ???

thanks for any best practice guidelines

Comment on file handling question
Re: file handling question
by Zaxo (Archbishop) on Dec 17, 2003 at 04:25 UTC

    IO::AtomicFile may be what you want, but it has nothing to do with file locking. If you look at the source, you'll see that a temporary file is written, and renamed on close. The atomic nature of rename in the OS is what accomplishes that. A weakness of this module is that the temporary file name is not safe if two instances open the same file. Both temporaries will be giwen the same name, with bad results.

    Voluntary or mandatory flock is another animal. It is able to handle the retry strategy you mention, or else to block, waiting to the lock to become available. I'm not sure whether Apache honors voluntary locks for reading html files, perhaps another monk knows.

    Some file systems and OS's are weak in handling concurrency, but most unices are ok.

    After Compline,
    Zaxo

Re: file handling question
by graff (Chancellor) on Dec 17, 2003 at 05:20 UTC
    If the process that writes a given HTML file takes some noticeable amount of time between start and finish, and you want to make sure that web visitors will only see the complete form of the file, something like the following ought to be all you need:
    • open a new file for output with a name that won't be visible to web visitors, or create it with the intended public name but in a different path (not publicly exposed) on the same disk;
    • once the output to the file is complete and the file is closed (and you're sure there weren't any errors), rename the file to the intended path/name in the public web directory -- it now becomes instantly visible to the next person who (re)loads the url for the file, and from the visitor's perspective, it is never partial or incomplete.
    If the web server is a unix box, "renaming" a file from one disk to another (e.g. from /tmp to /public_html) really means copying it, which will not be instantaneous -- maybe not as slow as the process that writes the file in the first place, but still not as fast as renaming a file so that it stays on the same volume, which just involves moving an inode entry from one directory to another. (This is likely to be true on any OS, even those that don't have things called "inodes".)

    As for reading config files (it took me a while to get the connection between the first and second paragraph)... If you're worried that a process reading a config file might get an incomplete or "transient" version of the data -- and if this is a persistent, pernicious concern -- you might consider making up a little table (database or flat file) that stores file names with data checksums. Read the file once, compute its checksum, and if that doesn't match the checksum in the table, treat it as an error condition. (You could try reading it again after a delay, to see if the problem persists, but if it fails twice, you might as will quit.)

    This would require a little more infrastructure for managing your config files, to make sure that the checksum table is updated every time a file is intentionally added, deleted or altered.

      If the process that writes a given HTML file takes some noticeable amount of time... something like the following ought to be all you need: open a new file for output ... once the output to the file is complete and the file is closed (and you're sure there weren't any errors), rename the file to the intended path/name

      Which, not coincidentally, is what the IO::AtomicFile module does.

        Thanks!!!

        I think that confirms that my choice for IO::AtomicFile is a good one ;-)

        The second part, which I can see was a little confusing, is as follows: (I am on win32, but want to be portable)

        I have a server process and it often needs to read config files for instructions. I expect the file to always be readable, but one time it wasn't (I think I was in the debugger and maybe viewing the file also, no locking that I am aware of...). Instead of just throwing out the job, I figured I could retry opening the file a few times, maybe sleeping .1 sec in between? Is this just overkill? Curious, if you are manually editing a crontab file in vi and cron runs, what happens? (I don't have unix or cron...just use windows and we don't expect users to edit config files behind our backs ;-)

        thanks for any defensive programming ideas !

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://315214]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2014-12-29 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (194 votes), past polls