Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

[SOLVED] Strange IO + concurrency issue

by vsespb (Hermit)
on Sep 28, 2013 at 15:26 UTC ( #1056143=perlquestion: print w/ replies, xml ) Need Help??
vsespb has asked for the wisdom of the Perl Monks concerning the following question:

use strict; use warnings; use Carp; use File::Copy; use Fcntl qw/LOCK_SH LOCK_EX LOCK_NB LOCK_UN/; sub x_copy # NOT USED { my ($src, $dst) = @_; open (my $f, "<", $src) or confess; binmode $f; read $f, my $buf, -s $f or confess ; close $f; open (my $o, ">", $dst) or confess; binmode $o; print $o $buf; close $o; } sub getlock { open my $f, ">", "lock.tmp" or confess; flock $f, LOCK_EX or confess; shift->(); flock $f, LOCK_UN or confess; close $f; } for (1..5) { unless (fork()) { for (1..5) { my $filename = "somefile$_.tmp"; my $f=undef; getlock sub { unless (-e $filename) { open ($f, ">", $filename) or confess; binmode $f; } }; if ($f) { print ($f "x") for (1..40_000_000); close $f; } my $newfilename = "c_${$}_$filename"; unless (-e $newfilename) { copy($filename, $newfilename) or confess; die if -s $filename != -s $newfilename; } } exit; } } while (wait != -1) { };
outputs the following (tested on Linux, perl 5.10 and 5.18):
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24171_somefile1.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24171_somefile2.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24171_somefile3.tmp
-rw-r--r-- 1 vse vse 32288768 2013-09-28 19:10 c_24171_somefile4.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24171_somefile5.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24172_somefile1.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24172_somefile2.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24172_somefile3.tmp
-rw-r--r-- 1 vse vse 28540928 2013-09-28 19:10 c_24172_somefile4.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24172_somefile5.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24173_somefile1.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24173_somefile2.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24173_somefile3.tmp
-rw-r--r-- 1 vse vse 21893120 2013-09-28 19:10 c_24173_somefile4.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24173_somefile5.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24174_somefile1.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24174_somefile2.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24174_somefile3.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24174_somefile4.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24174_somefile5.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24175_somefile1.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24175_somefile2.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24175_somefile3.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 c_24175_somefile4.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 c_24175_somefile5.tmp
-rw-r--r-- 1 vse vse        0 2013-09-28 19:10 lock.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 somefile1.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 somefile2.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 somefile3.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 somefile4.tmp
-rw-r--r-- 1 vse vse 40000000 2013-09-28 19:10 somefile5.tmp
So:
1. file size of some c_* files is less than 40000000, however this should not happen, because of line:
die if -s $filename != -s $newfilename;

2. If I replace copy() with x_copy() call, the following line reports error (or eof):
read $f, my $buf, -s $f or confess ;


3. there is locking via getlock() where it should be.
4. all files are closed, so buffering is not an issue?

5. this code works, with 1 child process, but fails on concurrent processes.

Question: I assume, after file is closed, it's written to filesystem Am I right? If yes, then I have to suspect some bug in perl, please help me find where I am wrong.
Initially I had this issue in a bigger program, but now I simplified it to this proof-of-concept code which reproduce issue.

UPD: Solved: Re^2: Strange IO + concurrency issue

Comment on [SOLVED] Strange IO + concurrency issue
Select or Download Code
Re: Strange IO + concurrency issue
by davido (Archbishop) on Sep 28, 2013 at 16:39 UTC

    Look more closely at how you're obtaining a lock:

    1. You are opening a lock to a semaphore file (lock.tmp).
    2. Once that lock is obtained you are opening an output file (somefile$_.tmp).
    3. You unlock/close your semaphore file.
    4. You write to somefile$_.tmp
    5. You close somefile$_.tmp.
    6. You copy from your tmp file to a new file.

    Steps 4, 5, and 6 are unprotected.

    You should not be releasing your semaphore file lock until after you're done writing to somefile$_.tmp, and in fact, probably not even until after the copy operation. And probably shouldn't be using LOCK_UN as a matter of habit (simply closing a filehandle will terminate the lock, and explicitly unlocking before closing can create a race condition, though not in this usage.)

    You may already know about this set of slides from Mark Jason Dominus, but just in case: File Locking Tricks and Traps.


    Dave

      1. You are opening a lock to a semaphore file (lock.tmp).
      2. Once that lock is obtained you are opening an output file (somefile$_.tmp).
      3. You unlock/close your semaphore file.
      4. You write to somefile$_.tmp
      5. You close somefile$_.tmp.
      6. You copy from your tmp file to a new file.
      Steps 4, 5, and 6 are unprotected.
      I knew that if lock extended to other steps it works, but I could not understand why.

      The thing is: When I open file on step (2), I write to it in step (4). But steps (4) and (5) are actually protected. Steps (4) and (5) is performed only if that process was the one who created file. Otherwise step is skipped. (and this was intentional - I was trying to minimize lock time which was important)

      Notice "if ($f)"
      if ($f) { print ($f "x") for (1..40_000_000); close $f; }
      Problem was that step (6) unprotected. And this assertion was wrong:
      die if -s $filename != -s $newfilename;
      Correct assertion would be:
      die if 40_000_000 != -s $newfilename;
      i.e. when data copied, file could be in the middle of creation by another process. so in the end I see:
      correct size of $filename
      wrong size of $newfilename
      assertion passed: if -s $filename != -s $newfilename;

      Thank you! I'll mark post as SOLVED
        Notice "if ($f)"

        I suspect this test does not do what you think it does. In particular, consider the case that one process opens the file and writes to it, then another process opens the same file and, perhaps, writes to it, then the if ($f) test executes in the first process. Is the result any different because of what the other process did? I think you will find it is not and, therefore, that the statements in the if block are not as protected as you think they are.

        Whether your problem is solved depends on what you are trying to do, which you don't say, but what you are calling a solution seems strange to me. I suggest you reconsider.

        Running your program on Windows several times, each trial I get a variable subset of the 25 potential output files with names beginning with 'c' and several of them have less than 40_000_000 characters in them.

        1. file size of some c_* files is less than 40000000, however this should not happen, because of line: die if -s $filename != -s $newfilename;

        The die will not prevent the production of files with less than 40_000_000 characters because the test is performed after the copy, not before. Even if you moved the test before the copy, because these statements are not protected by the lock file, the test could pass but another process could modify the source file before the copy executes, resulting in a file of different length (and, perhaps, content if in your real program the different processes write different content).

        Changing the test on the die to 40_000_000 != -s $newfilename makes little difference to the outcome: there is still a variable subset of the possible 25 copied files and there are still several of them with fewer than 40_000_000 characters. Of course, it makes a difference as, when the test runs, the size of $filename might be other than 40_000_000. This will be somewhat random, depending on how execution of the various processes is interleaved. But, perhaps this is exactly as you intend and I am worrying for nothing.

Re: Strange IO + concurrency issue
by RichardK (Priest) on Sep 28, 2013 at 16:44 UTC

    I'm not entirely sure what you are trying to achieve, but the issue seems to be that you are only locking the file open not the file writes.

    so :- fork 1 fork 2 takes the lock opens file1 releases the lock starts writing data gets pre-empted takes the lock opens file1 (& deletes contents) releases the lock starts writing data Etc....

    So potentially many children all writing to the same file at the same time. If you get each fork to write a different character to the file you'll see what's happening.

    It's usually better to get each child to write to its own temp file, File::Temp, then rename it.

      So potentially many children all writing to the same file at the same time.
      See above Re^2: Strange IO + concurrency issue - write-to file is actually protected (although it's extremely unclear in code). What was not protected is file copy (where destination file is unique)
      It's usually better to get each child to write to its own temp file
      Same, in this case there are no concurrent writes to same file (even copy writes to unique file).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1056143]
Approved by Perlbotics
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (10)
As of 2014-12-29 08:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (185 votes), past polls