Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

A flock()alypse now

by ferrency (Deacon)
on Jul 02, 2002 at 13:49 UTC ( #178847=perlquestion: print w/ replies, xml ) Need Help??
ferrency has asked for the wisdom of the Perl Monks concerning the following question:

Turnstep presents a bunch of great information in his File Locking tutorial. However, there is at least one file locking technique which I have not seen anyone use on perlmonks. I'm not sure if that's because there's something wrong with it, or because I haven't been looking hard enough.

First, some background. As turnstep points out, this is Bad (tm):

open my $FH, ">the_file"; # zero-byte the file flock($FH, LOCK_EX); # and Then lock it...
And, since you need to give flock() a file handle, it's not completely obvious how to read, process, and then write a file while maintaining a lock on it. This is an example of how Not to do it:

open my $FH, "the_file"; # open for reading flock($FH, LOCK_SH); # lock it my @lines = <$FH>; # read it open $FH, ">the_file"; # open for writing: LOCK DROPPED. # Now someone else picks up the lock and finds an empty # file. When they're finished, flock($FH, LOCK_EX); # lock it again push @lines, "new line\n"; # process the lines print $FH for (@lines); # write them out close $FH; # close and drop lock
Turnstep's tutorial presented the standard technique to fix this: use a separate "semaphore file" to keep track of locking:

# Open a different file as a semaphore lock open my $SEM, ">the_file.semaphore"; flock($SEM, LOCK_EX); # lock it # then process the Real File open my $FH, "the_file"; my @lines = <$FH>; # read it # because we never fiddle with $SEM, we still have a lock open $FH, ">the_file"; # open for writing. push @lines, "new line\n"; # process the lines print $FH for (@lines); # write them out close $FH; # close the file close $SEM; # close and unlock semaphore
The main problem with this is, it "doesn't play well with others." If you are trying to cooperate with other programs which you have no control over, you may not have the luxury of being able to use a semaphore file with a different name.

To me, the solution to this seems obvious. But I haven't seen this on perlmonks before, so my paranoia makes me think there's something wrong with it. I use the file itself as its own "semaphore file." This should work fine, since flock() locks are only advisory, not forced locks (that is, you can choose to ignore them and the OS won't even slap you on the wrist for it).

# Open the file for locking purposes only open my $SEM, "the_file"; flock($SEM, LOCK_EX); # lock it # Process it open my $FH, "the_file"; my @lines = <$FH>; open $FH, ">the_file"; push @lines, "new line\n"; print $FH for (@lines); close $FH; close $SEM; # close and unlock semaphore
Now we don't have any race conditions, because we're not doing anything with the_file before we get the lock, and we're not dropping the lock before we're completely finished with processing the file. We also behave well in relation to other programs which may be acquiring locks on the same file. We aren't creating stray lock files lying around which may need to be cleaned up periodically. It seems better in almost every respect... but does it work?

Thank you for your comments on this.

Alan

Comment on A flock()alypse now
Select or Download Code
Re: A flock()alypse now
by Notromda (Pilgrim) on Jul 02, 2002 at 16:24 UTC
    Someone correct me if I'm wrong, but all you need to do is check the return code on the flock() call. If you got the lock, do your stuff, otherwise, do something else.
    # Open the file open my $SEM, "the_file"; if (flock($SEM, LOCK_EX)) { # lock it #do stuff } close $SEM;
      You're right: I didn't do any error checking on the flock() calls. I should have. But (based on the assumptions I made in my code snippets: files exist, no interrupts, etc) it should never return until it gets a lock unless you're using LOCK_NB anyway, since it's a blocking call.

      But that's not really the primary point of my post. I was wondering if anyone knew of any problems with opening the_file multiple times, once for locking and once for reading/writing/etc. Update: No one seems to have found anything wrong with it, so for now I'm assuming it's a valid technique (even if my sample code doesn't implement it completely correctly :) Please tell me if you can see anything wrong with the basic technique I presented in my original node, independant of the actual code used to implement it.

      Thanks,

      Alan

      Update, re AN's comments: It's sample code. It was meant to demonstrate a point, not to be complete, or even to work (since I didn't know if it did). I think my point and question were demonstrated more clearly and concisely by making these assumptions instead of by increasing the sample code beyond the size of easy readability.

      Thank you for your clarifications and comments on my methods. Clearly, knowing the environment you're working in plays an important part in determining what assumptions are safe to make and which aren't, in your specific case. But without some assumptions, you can't even count on things as basic as the behavior of the particular OS you assume you're running under.

        But (based on the assumptions I made in my code snippets: files exist, no interrupts, etc) it should never return until it gets a lock unless you're using LOCK_NB anyway, since it's a blocking call.

        Extremely bad assumption. The call can fail for a million and one reasons, some of them rather non-obvious. See your OS for details.

        For instance do you pay attention to which directories are NFS mounted? Even if you do, does your sysadmin know which directories cannot be NFS mounted without causing nasty (but intermittent) race conditions?

        No one seems to have found anything wrong with it, so for now I'm assuming it's a valid technique (even if my sample code doesn't implement it completely correctly :)

        Programming by experiment? That is very risky. Try to find documentation. FYI I have seen cases where you can lose the lock on a file when you close any filehandle on it even if you opened the lock on the other filehandle. Sometimes your code will behave correctly, sometimes not. I don't know details offhand, but if you are relying on the contrary behaviour, be damned sure that you have a test suite so you know that your version of Perl/OS behaves as you expect.

        You know, locking is the one thing you don't want to get wrong. Because if you do, the signs aren't obvious until a long time later, and they are intermittent. Intermittent problems are the hardest to debug, particularly if you start with a bunch of wrong assumptions about why you are protected.

Re: A flock()alypse now
by BazB (Priest) on Jul 02, 2002 at 18:17 UTC

    Semaphore files have their uses:

    • What if the file you want to lock doesn't exist yet?
    • What if you want to lock all the files in a directory?
    • What if you want to lock another resource that isn't even a file, or doesn't lend itself to being flock()ed?

    Also, if you are going to just lock the file you want to modify instead of using a semaphore file, why are you opening the file, flock()ing the filehandle, then opening the file again?
    Open the file with the permissions you'll need to perform the operations you've got planned, flock() it, do the operation, close the file.

    Advisory locking never "plays well with others" if one of the those "others" ignores a lock, be it on a semaphore file, or the file itself. Generally, advisory locks are good enough.
    I'd be surprised if there isn't a way of controlling other processes, and making sure that they go through filehandling/locking properly.

    BazB

    Update: Correction. Removed discussion on bad example (since author _knows_ it's bad :-)

      You are correct, and I won't argue that semaphores have no use. But, my point was to see if my technique actually works, for use when it's the appropriate solution.

      BazB wrote: Also, if you are going to just lock the file you want to modify instead of using a semaphore file, why are you opening the file, flock()ing the filehandle, closing the file (hence dropping the lock), then re-opening the file?

      In the sample code I provided which I didn't specifically label as broken, I don't close the $SEM filehandle after getting a lock on it, until after I open the same file in a different filehandle, process it, and close it. That would be Wrong :)

      The main use of a semaphore or second filehandle to lock a file (as opposed to the other uses you mentioned) is because you can't always open, flock, do the operations, and close the file in a straightforward way. For example, if you are doing a truncating open-for-write, you'll truncate the file before you have a lock on it. If you want to both read and write the file while it's locked, you have to either use a tricky read/write open and some seek's, or you have to open it, read it, close it, open it, then write it.

      Advisory locking never "plays well with others" if one of the those "others" ignores a lock, be it on a semaphore file, or the file itself.

      That's definitely true. But those aren't the cases I was talking about: I meant the cases where an existing code uses advisory locking in a known but unchangeable way, and you need to cooperate with it.

      Thank you for your insight,

      Alan

Re: A flock()alypse now
by belg4mit (Prior) on Jul 02, 2002 at 18:27 UTC
    Color me kooky, but wouldn't it be simplest to open the file in read-write mode?
    open(my $FH , '+<', 'the_file')

    --
    perl -pew "s/\b;([mnst])/'$1/g"

      Yeah, that would work too :)

      It would be "simplest" if you are familiar with dealing with read/write filehandles. Unfortunately I'm not that familiar with those techniques right now. So it's not "simplest" for me until I learn a bit more.

      Thanks for the alternate solution, and a pointer towards more things to learn.

      Alan

        This approach causes data loss if:

        • the process is terminated for any reason (user intervention, exceeding resource limits, system crash)
        • the disk fills up during execution

        If you don't want to risk losing data, use the sentinel locking approach already mentioned by others in this thread.

        However, if the weaknesses of using only a read/write filehandle aren't a problem for you, you might want to look into FileHandle::Rollback.

Re: A flock()alypse now
by perigeeV (Hermit) on Jul 02, 2002 at 20:43 UTC

    This should work fine for a single file. Semaphores are required for multiple file locking of course, but that was covered above. By openning the file twice with different filehandles you're just getting a different file descriptor to work with. flock locks files, not file descriptors. Even using system calls dup and fork just cause multiple references to the same lock.


Re: A flock()alypse now
by Aristotle (Chancellor) on Jul 03, 2002 at 14:38 UTC

    Disclaimer: I have yet to have had to do any real work with locking; what I know comes from a lot of theory I've read on the subject. Ok, now that we've dealt with that,

    I would think the following is perfectly valid, and perfect period:

    use Fcntl qw(:DEFAULT :flock :seek); sysopen FH, "file.name", O_RDWR | O_CREAT or die "horribly - $!"; # no + O_TRUNC!! flock FH, LOCK_EX or die "screaming - $!"; my @slurp = <FH>; do_something_with(\@slurp); seek FH, 0, SEEK_SET; print FH @slurp; truncate FH, tell FH; close FH;
    As far as I can tell, it deals with everything. It requires no semaphore files so I needn't worry about potential write permission woes nor cleaning up after myself when I exit, and it operates on exactly one filehandle, which is the one holding the lock. Someone tell me if there's something missing in my picture; if not, then I believe (barring other requirements) this is The Way To Do It.

    Makeshifts last the longest.

      The only concern I'd have with this technique is the problem tomhukins described earlier. I'm not sure if what he describes is likely, and whether it's more likely with a read/write filehandle than otherwise, but it's something to consider.

      Using a semaphore-like technique (whether it locks the file being modified or another file) also lends itself to situations where your data processing is abstracted away from your data input and output.

      Thanks for the example!

      Alan

Re: A flock()alypse now
by valdez (Monsignor) on Jul 03, 2002 at 14:43 UTC

    I think that your solution may work, but there is simpler solution; as usual the solution is in your manuals.

    Take a look at man perlopentut, in section file locking

    If you are not doing this for study, there is already a solution on CPAN, one of these is IO::LockedFile

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://178847]
Approved by Sidhekin
Front-paged by Snuggle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2014-12-29 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (192 votes), past polls