Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Strange IO + concurrency issue

by vsespb (Hermit)
on Sep 28, 2013 at 17:11 UTC ( #1056153=note: print w/ replies, xml ) Need Help??


in reply to Re: Strange IO + concurrency issue
in thread [SOLVED] Strange IO + concurrency issue

1. You are opening a lock to a semaphore file (lock.tmp).
2. Once that lock is obtained you are opening an output file (somefile$_.tmp).
3. You unlock/close your semaphore file.
4. You write to somefile$_.tmp
5. You close somefile$_.tmp.
6. You copy from your tmp file to a new file.
Steps 4, 5, and 6 are unprotected.
I knew that if lock extended to other steps it works, but I could not understand why.

The thing is: When I open file on step (2), I write to it in step (4). But steps (4) and (5) are actually protected. Steps (4) and (5) is performed only if that process was the one who created file. Otherwise step is skipped. (and this was intentional - I was trying to minimize lock time which was important)

Notice "if ($f)"
if ($f) { print ($f "x") for (1..40_000_000); close $f; }
Problem was that step (6) unprotected. And this assertion was wrong:
die if -s $filename != -s $newfilename;
Correct assertion would be:
die if 40_000_000 != -s $newfilename;
i.e. when data copied, file could be in the middle of creation by another process. so in the end I see:
correct size of $filename
wrong size of $newfilename
assertion passed: if -s $filename != -s $newfilename;

Thank you! I'll mark post as SOLVED


Comment on Re^2: Strange IO + concurrency issue
Download Code
Re^3: Strange IO + concurrency issue
by ig (Vicar) on Sep 30, 2013 at 05:50 UTC
    Notice "if ($f)"

    I suspect this test does not do what you think it does. In particular, consider the case that one process opens the file and writes to it, then another process opens the same file and, perhaps, writes to it, then the if ($f) test executes in the first process. Is the result any different because of what the other process did? I think you will find it is not and, therefore, that the statements in the if block are not as protected as you think they are.

    Whether your problem is solved depends on what you are trying to do, which you don't say, but what you are calling a solution seems strange to me. I suggest you reconsider.

    Running your program on Windows several times, each trial I get a variable subset of the 25 potential output files with names beginning with 'c' and several of them have less than 40_000_000 characters in them.

    1. file size of some c_* files is less than 40000000, however this should not happen, because of line: die if -s $filename != -s $newfilename;

    The die will not prevent the production of files with less than 40_000_000 characters because the test is performed after the copy, not before. Even if you moved the test before the copy, because these statements are not protected by the lock file, the test could pass but another process could modify the source file before the copy executes, resulting in a file of different length (and, perhaps, content if in your real program the different processes write different content).

    Changing the test on the die to 40_000_000 != -s $newfilename makes little difference to the outcome: there is still a variable subset of the possible 25 copied files and there are still several of them with fewer than 40_000_000 characters. Of course, it makes a difference as, when the test runs, the size of $filename might be other than 40_000_000. This will be somewhat random, depending on how execution of the various processes is interleaved. But, perhaps this is exactly as you intend and I am worrying for nothing.

      I suspect this test does not do what you think it does. In particular, consider the case that one process opens the file and writes to it, then another process opens the same file and, perhaps, writes to it, then the if ($f) test executes in the first process. Is the result any different because of what the other process did? I think you will find it is not and, therefore, that the statements in the if block are not as protected as you think they are.
      Notice also "my $f = undef" and "unless (-e $filename)"
      my $f=undef; getlock sub { unless (-e $filename) { open ($f, ">", $filename) or confess; binmode $f; } }; if ($f) {
      ( I admit that this code is unclear )
      Whether your problem is solved depends on what you are trying to do, which you don't say, but what you are calling a solution seems strange to me.
      That was proof of concept code, it contained a bug. Bug found, thus solved.
      what you are trying to do, which you don't say
      This was proof-of-concept code. I.e. I simplified my 1000 lines program to this code. Thus what I am trying to do is behind the scene, and we should focus only on technical part of problem, not business requirements.
      If I post my original code, this would require at least couple of hours to set up things and reproduce the problem for anyone who tries to.
      the test could pass but another process could modify the source file before the copy executes
      Yes, but btw there are no concurrent writes to same file.
      Changing the test on the die to 40_000_000 != -s $newfilename makes little difference to the outcome: there is still a variable subset of the possible 25 copied files and there are still several of them with fewer than 40_000_000 characters.
      Well, it indeed does not fix program to output correct files, but it causes some processes to DIE, thus eliminating a bug. See assertions.

      Actual fix is to extend lock to whole program, until copy is done (as other posters suggested). Or even just extend it to the point when write to source file is done and file is closed.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1056153]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (10)
As of 2014-10-21 17:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (106 votes), past polls