open for append

perldoc -f flock recommends (indirectly, in its code example) that files that are opened-for-append should be flock()ed before being written to. A few years ago, I was horrified to see a peer writing to a log file (which was opened for append), without bothering to lock it first. It was especially troubling, knowing that the log file in question was to be later used to audit the effectiveness of the program.

Strangely, my fellow programmer was not moved. I tried writing some test code to demonstrate the need for concern. I wasn't able to simulate a problem. I wrote to comp.lang.perl.misc, to seek advice. I learned that the POSIX C library guarantees that, when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten.

Recently I saw Stas Bekman give his mod_perl 2.0 by example talk at SPUG. I saw a similar code example in one of his slides, flock()ing a filehandle before writing to it, when it had been opened for append.

This got me thinking again about it, about about how Perl interacts with libc. I read in the SuSE Linux man page for open(2) that trying this trick on an NFS mounted file system may lead to corruption. Fare enough.

I wrote a test program that takes three arguments, the total number of children to run, the number of lines each child should write to the logfile, and the number of children to run concurrently. It forks a bunch of children, which open a log file for appending, and write a bunch of lines to it. Then the parent reaps the kids, and counts what it finds in the log file. There are three simple tests: that the number of lines is what's expected, that the number of bytes is what's expected, and that the lines each have the expected number of bytes on them.

Under SuSE linux, one cpu (but a multithreaded pentium4, for whatever that's worth), I could make the last test fail, if I left Perl up to its usual IO buffering tricks. But if I turned on autoflush, then I could not make that test fail. I wonder if that test will fail on multi cpu systems. I'll give it a try tomorrow.

I'm curious to see results from other systems. I wonder what other sorts of things might cause corruption, even if it's just messing up the lines. I wonder how does this works on other OSs, like Windows or Cygwin.

It's also quite interesting to compare the execution time for different numbers of concurrent children, and the difference in speed behavior when the children have autoflush turned on or not.

-Colin.

Update: changed "data shall not be lost" to "data shall not be overwritten".

#!/usr/local/bin/perl

use strict;
use warnings;
use POSIX;
use Test::More tests => 3;


#---------------------------
# behavioral parameters and globals


use constant DEBUG              => 0;


use constant FILE_NAME          => '/tmp/perl_append_test';
use constant CHILD_REPORT_POINT => 100;
use constant LOG_FORMAT         => "[%6d] %6d %6d\n";


# try changing this to zero to see line size discrepansies
use constant FLUSH_BUFFER => 1;


my $NUM_CHILDREN = shift || 801;
my $NUM_LINES    = shift || 511;
my $MAX_CHILDREN = shift || 100;
my %kids;
my $child_count = 0;

#---------------------------
$|++;

unlink FILE_NAME;

print "# children: $NUM_CHILDREN\n" if DEBUG;
print "# lines:    $NUM_LINES\n\n" if DEBUG;

print "# spawning ... \n" if DEBUG;
print "# parent pid: $$\n" if DEBUG;

spawn_kids();

clean_up_after_kids();


#---------------------------
# make a report
#---------------------------

my $expected_line_length = length( sprintf( LOG_FORMAT, 0, 0, 0 ) );

print "\n# counting money ...\n\n" if DEBUG;
my ($lines, $bytes);
my $discrepansies = 0;

open my $log, '<', FILE_NAME
  or die "$$ couldn't open log: $!";

while ( <$log> ) {
  $lines++;
  $bytes += my $l = length;
  $discrepansies++ if $l != $expected_line_length;
}

close $log;

is( $lines, $NUM_CHILDREN * $NUM_LINES,
    'number of lines in log matches expectation' );
is( $bytes, $NUM_CHILDREN * $NUM_LINES * $expected_line_length,
    'bytes in logfile matches expectation' );
is( $discrepansies, 0, 'no line size discrepansies' );

if ( DEBUG ) {
  print "\n";
  print "# lines: $lines\n";
  print "# bytes: $bytes\n";
  print "# discrepansies: $discrepansies\n";
  print "\n";
}

exit;

#
#-----------------------------------------------------------

sub spawn_kids {
  for my $child_num ( 1..$NUM_CHILDREN ) {
    ++$child_count;

    print "\r# concurrent children: $child_count        " if DEBUG;
    print "\n# child count: $child_num\n"
      if !($child_num % CHILD_REPORT_POINT) && DEBUG;

    reap_some();
    while ( $child_count >= $MAX_CHILDREN ) {
      reap_some();
      sleep 1
    }

    my $p = fork();
    die "couldn't fork: $!" unless defined $p;
    if ( $p ) {
      $kids{ $p }++;
    }
    else {
      # I'm the child
      append_like_hell( $child_num );
      exit;
    }
  }
  print "\n" if DEBUG;
}
sub clean_up_after_kids {
  print "# reaping ... \n" if DEBUG;

  while ( keys %kids ) {
    reap_some();
    print "\r# concurrent children: $child_count        " if DEBUG;
    sleep 1;
  }
  print "\n" if DEBUG;
}


#---------------------------
# find dead kids
#---------------------------
sub reap_some {
  while(( my $pid = waitpid(-1, POSIX::WNOHANG)) > 0) {
    --$child_count;
    delete $kids{ $pid };
  }
}


#---------------------------
# this is what the kids do
#---------------------------
sub append_like_hell {
  my $id = shift;
  open my $log, '>>', FILE_NAME
    or die "[$id] $$ couldn't open log: $!";

  select( ( select( $log ), $|++ )[0] ) if FLUSH_BUFFER;

  for ( 1..$NUM_LINES ) {
    print $log sprintf(LOG_FORMAT, $id, $$, $_)
      or die "[$id] $$ couldn't print line $_: $!";
  }
  close $log or die "[$id] $$ couldn't close log: $!";
}
[download]

WHITEPAGES.COM | INC

Comment on open for append Select or Download Code

Replies are listed 'Best First'.
Re: open for append by perrin (Chancellor) on Aug 25, 2005 at 11:12 UTC
As I understand it, the main thing to watch out for is not going over a single block in a write, i.e. if block size on your system is 4K, writing more than 4K may break this.	[reply]
Re^2: open for append by samtregar (Abbot) on Aug 25, 2005 at 17:38 UTC
You're right. Modifying the test script to write lines containing 5k of data causes errors. -sam	[reply]
Re^3: open for append by cmeyer (Pilgrim) on Aug 26, 2005 at 21:38 UTC
Yes, increasing the line size to just over 4.1k causes errors in the line length discrepancies test, indicating data interleave problems between different processes. I didn't get interleaving problems with the line sizes any smaller than that. I'm running on reiserfs, which I understand has some sort of variable block size. I never see discrepancies with the total byte count, which seems to match the open(2) documentation on my system. Anyway, the take home is that you may as well be safe and perform exclusive locking, when you have many processes writing to the same file (which is opened for append). I haven't done any benchmarking to see if there is significant savings from not doing lock. It is conceivable that someone might wish to consider this possibility, if they could be assured that each "record" written to the file were smaller than their block size. -Colin. WHITEPAGES.COM \| INC	[reply]
Re: open for append by derby (Abbot) on Aug 25, 2005 at 12:25 UTC
I learned that the POSIX C library guarantees that, when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be lost. I don't think it guarantees that data shall not be lost ... There's still to much going on to guarantee no lost data (you can still get errors such as EFBIG, EINTR, ENOSPC, and EIO). From the spec a write can be any size: If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation. The only time to worry is if you're writing to a PIPE or FIFO: Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set. -derby	[reply]
Re^2: open for append by cmeyer (Pilgrim) on Aug 25, 2005 at 15:46 UTC
Good point. I'll edit my post to say that the C library guarantees that data shall not be overwritten. -Colin. WHITEPAGES.COM \| INC	[reply]
Re^3: open for append by merlyn (Sage) on Aug 25, 2005 at 15:53 UTC
the C library guarantees that data shall not be overwritten But it's not the C library either. It's the kernel. The C library doesn't care, and doesn't know that O_APPEND was used. It simply does the write(2) call, and because the open(1) call had O_APPEND, the kernel arranges for write-at-end in an atomic fashion. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: open for append (family atomics) by tye (Sage) on Aug 25, 2005 at 16:26 UTC
Note that Win32 doesn't do the seek-then-write atomicly, so you should lock if your code might be/is run on Win32. Also note that if you didn't use binmode (which is likely true and usually for good reason), then the order of things is seek, then newline conversion, then write, which makes the race even racier. As mentioned elsewhere, another problem is if the write is too large, which will cause it to be broken up into chunks and the chunks from different processes might be interleaved. Another problem is if you do more than one write operation. If the total data to be written is small enough and you are on a POSIX system, then you can build up the entire string to be written and use a single write operation and (usually) not worry about locking. But sometimes it is more convenient to write the code using multiple write operations, in which case you'd want to lock so the data doesn't get interleaved. - tye	[reply]
Re: open for append by ysth (Canon) on Aug 25, 2005 at 10:08 UTC
I learned that the POSIX C library guarantees that, when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be lost. But I don't think that guarantees that data won't be interleaved, with output from process A in the middle of a write from process B. Does it?	[reply]
Re^2: open for append by perrin (Chancellor) on Aug 25, 2005 at 11:10 UTC
It does. It seeks to the end before every write.	[reply]
Re^3: open for append by kscaldef (Pilgrim) on Aug 26, 2005 at 18:20 UTC
Make sure to note merlin's comment below. It's not sufficient to seek then write in userspace, because it's not atomic. It needs to happen in the kernel, not libc.	[reply]
Re: open for append by perl5ever (Pilgrim) on May 16, 2011 at 17:26 UTC
You should use `syswrite` instead of `print`. `print` goes through libc's stdio (or perl's replacement) whereas `syswrite` is a direct system call. The corruption you are seeing when LOG_FORMAT has length greater than 4K is likely due to buffering by stdio. The code for `append_like_hell` using `syswrite` would be: `sub append_like_hell { my $id = shift; open my $log, '>>', FILE_NAME or die ... for (1..$NUM_LINES) { syswrite($log, sprintf(...)) or die "..."; } close($log); }` [download] Two notes: 1) flushing has no effect here; 2) even though syswrite returns the number of bytes written, because `$log` is opened to a disk file it will always be either undef or the length of the passed string. In general, to fully write a string using syswrite (e.g. to a socket or a pipe), one has to loop until the entire string has been written (cf. the following from AnyEvent::Worker): `for (my $ofs = 0; $ofs < length $wbuf; ) { $ofs += (my $wr = syswrite $fh, substr $wbuf, $ofs or die "unable to + write results"); }` [download]	[reply] [d/l] [select]

Back to Meditations