Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Safe to open+close for every write, without locking?

by sedusedan (Monk)
on Dec 21, 2012 at 03:46 UTC ( #1009847=perlquestion: print w/ replies, xml ) Need Help??
sedusedan has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

Below is a script (write.pl) to write 100k lines to a log file, with some random deliberate delays between print():

#!/usr/bin/perl use autodie; use Time::HiRes qw(nanosleep); unless (@ARGV == 2) { die "Usage: $0 <path> <str>\n"; } $|++; for (1..100_000) { open my($fh), ">>", $ARGV[0]; nanosleep(1_000_000*rand()); print $fh "$ARGV[1]: $_\n"; nanosleep(1_000_000*rand()); close $fh; }

I run 100 instances of this script:

$ for i in `seq 1 100`;do ( ./write.pl log $i & );done

As you can see, I'm trying to produce a race condition where a process' write clobbers another's. However, I cannot seem to do so. All lines from all processes are written to log file. This is tested using:

$ wc -l log $ grep ^1: log | wc -l $ grep ^2: log | wc -l ...

What am I doing "right"? Is it really safe then to open in append mode, write once, then close again, without any lock (I'm thinking no way)? Only on certain platforms? FYI, I'm writing a File::Write::Rotate module (something like Log::Dispatch::FileRotate but more lightweight and not tied to Log::Dispatch).

BTW, strace shows that perl delays the write (it should've been before the second nanosleep()):

open("log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
lseek(3, 0, SEEK_END)                   = 4750343
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffc9877930) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 4750343
fstat(3, {st_mode=S_IFREG|0644, st_size=4750370, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
nanosleep({0, 992359}, 0x7fffc9877cf0)  = 0
nanosleep({0, 347917}, 0x7fffc9877cf0)  = 0
write(3, "95: 3281\n", 9)               = 9
close(3)                                = 0

Debian Linux amd64, kernel 3.2.0, Perl 5.14.2

. pathstr

Comment on Safe to open+close for every write, without locking?
Select or Download Code
Re: Safe to open+close for every write, without locking?
by sedusedan (Monk) on Dec 21, 2012 at 03:58 UTC
    BTW, the reason I'm investigating open+write+close instead of using flock is because I might have to reopen a new file at any time anyway due to automatic rotation, plus using flock will make performance suffer when there are lots of concurrent writers. For example, with open+write+close a single process performance is 40k lines/sec and for 100 concurrent processes the overall performance is 154k lines/sec. Using flock, single process performance is 68k lines/sec but with 100 concurrent processes overall performance drops to 23k lines/sec.

      If performance is your primary criteria, you'd be much better off creating a demon that opens a fifo, have your processes write their lines to the fifo and have the demon write them to disk in 64k chunks.

      (Isn't that called a syslog under *nix?)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Like Log::Dispatch::FileRotate, my module will/might be used by CGI scripts (multiple writers, potentially lots of concurrent writers, one-off processes, no daemon).
Re: Safe to open+close for every write, without locking?
by Neutron Jack (Pilgrim) on Dec 21, 2012 at 05:16 UTC

    Opening with >> corresponds to an fopen(3) mode of "a". My fopen(3) man page says:

    Opening a file in append mode (a as the first character of mode) causes all subsequent write operations to this stream to occur at end-of-file, as if preceded the call:

        fseek(stream,0,SEEK_END);

    I suspect that fopen(..., "a") accomplishes this by using the O_APPEND flag of open(2) on modern Linux systems. On such systems, the implied seek & write would a single atomic operation (ignoring NFS).

      Interesting. However, without closing+reopening after each write (i.e. each process opens in "a" mode + writes 100k times + closes) there is clobbering.

Re: Safe to open+close for every write, without locking?
by graff (Chancellor) on Dec 21, 2012 at 06:20 UTC
    If I understand the OP code correctly, it seems like the lines being written to the one output file are all quite short - just two numerics that yield a maximum of 10 characters per line (counting the final line-feed - 11 characters if you're on a CRLF system).

    The problem I would want to check for is whether multiple competing processes, writing to the same output file, might interrupt each other if one or more of them were trying to write relatively long lines. I've seen this happen, and it makes the resulting file incomprehensible and unparsable.

    I'd be inclined to go with something like BrowserUK's suggestion, but if you want to pursue the OP strategy, you should test again by writing, say, 130 characters or more per line; follow the same tactic of starting each line with a token that is different for each process, and see whether you get the expected number of lines starting with those tokens, as opposed to things like this:

    PROC.1 This line is being written by process #1. PROC.2 This line is being writtPROC.1 This line is being written by pr +ocess #1. en by process #2. PROC.3 This line is beinPROC.4 This line is being written by process # +4. g written by pPROC.1 This line is being written by process #1. rocess #3.
    (... and so on) I'm using fewer than 130 characters per line there, but I hope you get what I'm talking about.

      I don't see how writing 10-11 bytes twelve times won't cause roughly as much clobbering as writing 130 bytes once, but I did the test anyway.

      A new script, write2.pl, to include several mechanisms:

      #!/usr/bin/perl use strict; use warnings; use autodie; use Bench; use Fcntl qw(:flock); unless (@ARGV == 3) { die "Usage: $0 <method> <path> <str>\n"; } my ($method, $file, $str) = @ARGV; bench sub { if ($method eq 'print') { open my($fh), ">>", $file; for (1..100_000) { print $fh "$str: $_\n"; } close $fh; } elsif ($method eq 'open+print+close') { for (1..100_000) { open my($fh), ">>", $file; print $fh "$str: $_\n"; close $fh; } } elsif ($method eq 'seek+print') { open my($fh), ">>", $file; for (1..100_000) { seek $fh, 0, 2; print $fh "$str: $_\n"; } close $fh; } elsif ($method eq 'flock+print') { open my($fh), ">>", $file; for (1..100_000) { flock $fh, LOCK_EX; print $fh "$str: $_\n"; flock $fh, LOCK_UN; } close $fh; } };

      Here's what the lines that get written should look like for process number 1:

      0010010010010010010010010010010010010010010010010010010010010010010010 +01001001001001001001001001001001001001001001001001001001001001: 1 0010010010010010010010010010010010010010010010010010010010010010010010 +01001001001001001001001001001001001001001001001001001001001001: 2 ...

      For process number 2:

      0020020020020020020020020020020020020020020020020020020020020020020020 +02002002002002002002002002002002002002002002002002002002002002: 1 0020020020020020020020020020020020020020020020020020020020020020020020 +02002002002002002002002002002002002002002002002002002002002002: 2 ...

      $ for i in `seq -f "%03g" 1 100`;do ( ./write2.pl print log $i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$$i$i$i$i & );done

      Clobbering. Each process finishes in +- 7-9 secs.

      $ for i in `seq -f "%03g" 1 100`;do ( ./write2.pl open+print+close log $i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$$i$i$i$i & );done

      No clobbering. Each process finishes in +- 75 secs.

      $ for i in `seq -f "%03g" 1 100`;do ( ./write2.pl seek+print log $i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$$i$i$i$i & );done

      No clobbering. Each process finishes in +- 30 secs.

      $ for i in `seq -f "%03g" 1 100`;do ( ./write2.pl flock+print log $i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$$i$i$i$i & );done

      No clobbering. Each process finishes in +- 450 secs.

      I'm guessing that the flock+print is the safest and most portable, but performance gets worse as the number of concurrent writers gets larger. For now I'm leaning towards the safest method, but will look into other possibilities in the future. What I would like to know is how safe the other methods are (open+print+close and seek+print) in avoiding clobbering.

Re: Safe to open+close for every write, without locking?
by flexvault (Parson) on Dec 21, 2012 at 10:20 UTC

    sedusedan,

    Take a look at "File Locking Tricks and Traps" at http://perl.plover.com/yak/flock/ for interesting things you can do with 'flock'.

    But in your case, I'd just 'use Sys::Syslog' !

    Good Luck!

    "Well done is better than well said." - Benjamin Franklin

      Thanks, informative slides.

      Regarding slide #4 (Trap #1: LOCK_UN) surely it is no longer the case, since per "perldoc -f flock": "To avoid the possibility of miscoordination, Perl now flushes FILEHANDLE before locking or unlocking it."?

      Also, thanks for the suggestion but I really do not want syslog for this.

        sedusedan,

        I didn't previously think about the consequences of slide #4, but it probably is why I don't 'flock' the actual file. I use a dummy file called ".../LOCK", that I use 'flock' to lock/unlock. This file is alway of length '0', so that allows the actual file that I'm writing/reading to continue to be buffered. You have to use it the exact same way. ('LOCK_EX' for write, 'LOCK_SH' for read.)

        As long as you're preventing the race condition by always 'flock'ing the dummy file, you will not get into trouble. Early on I found problems with 'flock'ing the actual file, which may be why 'flock' was fixed to eliminate the problem described in slide #4. But buffering is good!

        I looked at your benchmark, and noticed that you don't delete the file each time your script is entered and before you start the benchmark. Even though you're 'open'ing the file with append, you may distort the results due to more disk head activity by having a larger and larger file with each additional call. I adjusted your script and did a comparison to 'syslog' and found that 'syslog' was about twice as slow. I'm guessing that is the socket activity between different processes.

        Anyway, looks like you have a plan.

        Good Luck...Ed

        "Well done is better than well said." - Benjamin Franklin

Re: Safe to open+close for every write, without locking?
by dave_the_m (Parson) on Dec 21, 2012 at 15:32 UTC
    Because you've opened the file for appending, you won't see any data clobbering (i.e. data lost/overwritten) but you will see data overlapping and lines split. i.e. each process will buffer its data into (likely 8k) blocks then append that block to the file. The blocks from different processes may be interleaved. So if a line from one process straddles two blocks, the first half of the line will be written, then possibly 8K of lines and half lines from another process, followed by the remainder of that line.

    Dave.

      Can you provide a script to demonstrate that, perhaps?
        Here ya go:

        #!/usr/bin/perl open my $fh, '>>', '/tmp/out' or die "open: $!\n"; sub output { my ($id) = @_; my $line = sprintf "[%02d: my id is: %02d]\n", $id, $id; # 19 char +s print $fh $line x 500; # bigger than 8K, and $line straddles boun +dary warn "$id: sleeping...\n"; sleep 1; warn "$id: exiting\n"; exit; } for (1..3) { sleep 1; warn "$_: spawning...\n"; my $pid = fork(); die "fork failed: $!\n" unless defined $pid; next if $pid; # child output($_); } sleep 5; __END__
        And here's the output:
        $ rm -f /tmp/out $ perl5160 /tmp/p1 1: spawning... 1: sleeping... 2: spawning... 1: exiting 2: sleeping... 3: spawning... 3: sleeping... 2: exiting 3: exiting $ grep -v '^\[..: my id is: ..]$' /tmp/out [02[03: my id is: 03] : my id is: 03] $

        Dave.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1009847]
Approved by davido
Front-paged by Lotus1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (10)
As of 2014-07-23 08:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (137 votes), past polls