|laziness, impatience, and hubris|
open for appendby cmeyer (Pilgrim)
|on Aug 25, 2005 at 09:29 UTC||Need Help??|
perldoc -f flock recommends (indirectly, in its code example) that files that are opened-for-append should be flock()ed before being written to. A few years ago, I was horrified to see a peer writing to a log file (which was opened for append), without bothering to lock it first. It was especially troubling, knowing that the log file in question was to be later used to audit the effectiveness of the program.
Strangely, my fellow programmer was not moved. I tried writing some test code to demonstrate the need for concern. I wasn't able to simulate a problem. I wrote to comp.lang.perl.misc, to seek advice. I learned that the POSIX C library guarantees that, when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten.
Recently I saw Stas Bekman give his mod_perl 2.0 by example talk at SPUG. I saw a similar code example in one of his slides, flock()ing a filehandle before writing to it, when it had been opened for append.
This got me thinking again about it, about about how Perl interacts with libc. I read in the SuSE Linux man page for open(2) that trying this trick on an NFS mounted file system may lead to corruption. Fare enough.
I wrote a test program that takes three arguments, the total number of children to run, the number of lines each child should write to the logfile, and the number of children to run concurrently. It forks a bunch of children, which open a log file for appending, and write a bunch of lines to it. Then the parent reaps the kids, and counts what it finds in the log file. There are three simple tests: that the number of lines is what's expected, that the number of bytes is what's expected, and that the lines each have the expected number of bytes on them.
Under SuSE linux, one cpu (but a multithreaded pentium4, for whatever that's worth), I could make the last test fail, if I left Perl up to its usual IO buffering tricks. But if I turned on autoflush, then I could not make that test fail. I wonder if that test will fail on multi cpu systems. I'll give it a try tomorrow.
I'm curious to see results from other systems. I wonder what other sorts of things might cause corruption, even if it's just messing up the lines. I wonder how does this works on other OSs, like Windows or Cygwin.
It's also quite interesting to compare the execution time for different numbers of concurrent children, and the difference in speed behavior when the children have autoflush turned on or not.
-Colin.Update: changed "data shall not be lost" to "data shall not be overwritten".
WHITEPAGES.COM | INC