Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

File Locking plus delete Lockfile question

by rovf (Priest)
on Feb 12, 2009 at 11:18 UTC ( #743274=perlquestion: print w/replies, xml ) Need Help??
rovf has asked for the wisdom of the Perl Monks concerning the following question:

I have several parallel processes which access a critical region. I implement this region using a lockfile, which holds a counter (which I need for my app - it contains "the number of clients" using a certain part of the system). Inside the critical region, there occurs one of two operations: Checkin (which increments the counter) and checkout (which decrements the counter). My implementation so far is straightforward. Here a simplified version which shows the essential part:

use Fcntl qw(:DEFAULT :flock); ... sysopen(LOCKFILE,$lockfile,O_RDWR|O_CREAT) # create if necessary or die "Can not open/create $lockfile ($!)"; flock(LOCKFILE, LOCK_EX) or die "Locking error on $lockfile ($!)"; eval { my $number_of_clients=<LOCKFILE>||0; chomp($number_of_clients); seek(LOCKFILE,0,0) or die "Rewind error on $lockfile ($!)"; truncate(LOCKFILE,0) or die "Truncate error on $lockfile ($!)" +; if($operation='checkin') { ++$number_of_clients; ... } else { ... --$number_of_clients; } print LOCKFILE $number_of_clients,"\n"; } if($@) { warn "Exception: $@"; } flock(LOCKFILE, LOCK_UN) or die "Unlocking error on $lockfile ($!) +"; close(LOCKFILE) or warn "Error closing $lockfile ($!)";
This seems to work fine. But now comes a new twist: If (in the checkout case) the number of clients becomes zero, instead of writing 0 to the file, I would like to *delete* the lockfile. Now the problem is that it is not nice to delete a lockfile while it is still open, so I would have to do it *after* the close, don't I? But at that time, I don't own the lock anymore, so another process could have already acquired the file and incremented the number of clients.

How can I solve this?

One more note: I know that my solution is flawed in that it would allow the number of clients to become negative (if checkout is called without a matching checking). I am aware of this, but this is something I can handle easily.

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re: File Locking plus delete Lockfile question
by JavaFan (Canon) on Feb 12, 2009 at 11:47 UTC
    While I question the sanity of wanting to delete the lockfile (you do realize that other processes may already have opened the file and are all waiting to grab the lock?), you can delete a file that's open. But imagine what could happen (am I assuming Unix semantics):
    1. Process 1 sysopens the file.
    2. Process 1 grabs the lock.
    3. Process 2 sysopens the file.
    4. Process 2 queues to get the lock.
    5. Process 3 sysopens the file.
    6. Process 3 queues to get the lock.
    7. Process 1 deletes the file. (Name gone; file still there(!))
    8. Process 1 releases the lock.
    9. Process 2 gets the lock and proceeds.
    10. Process 4 sysopens a new lockfile.
    11. Process 4 grabs the lock.
    At this moment, you have two lock files, the old, now nameless one, and a new one. Both process 2 and 4 are in the critical section. Process 3 is still waiting, and will afterwards work with stale data.

    Basically what will happen is that each process that will throw away your lockfile will open a new (but empty) queue. But the old queue won't be emptied.

      While I question the sanity of wanting to delete the lockfile (you do realize that other processes may already have opened the file and are all waiting to grab the lock?)
      Indeed, this could be a problem which I didn't see before. The problem is that I really would like to go away the lockfile. The reason is that in my application, we have not just one lock file of this type, but we create maybe a hundred or so per day, each guarding its own "checkin/checkout" system. At one point in time, this system will cease to exist, but because of the distributed nature of the application, no single instance has the knowledge whether or not a single "checkin/checkout" system is still alive or not. We only know that *if* the number of customers is still greater than 0, it is alive. Otherwise, it may be dead.

      Initially, there is no lockfile at all (that's why I also pass the CREATE flag to sysopen. If the customer count drops back to zero one time, and I find a safe way to delete the lockfile, there is no harm done. If a new customer arrives later (i.e. it turns out that the system is not dead yet), the lockfile is simply recreated.

      If I don't delete the lockfile, I would either need an external instance which manages whether a specific checkin/checkout system is alive or dead (and erase the lockfiles of the dead ones), or I would end up with a large number of unused lockfiles.

      Thinking about your argument, I fear that the only safe way would be to do some kind of centralization. Probably I will end up having a single lockfile common to all "checkin/checkout" systems on each host and handle the customer count separately.

      Thank you for pointing out the flaw in my algorithm.

      -- 
      Ronald Fischer <ynnor@mm.st>
        At one point in time, this ["checkin/checkout"] system will cease to exist.
        Is there any other part of your application that will be able to unambigously detect/determine this time? If so, then can the cleanup code be inserted there? Or could a customer reactivate any checkin/out system at any time in the future?

        Even in the latter case, I think you should be able to clean up inactive lockfiles if the cleanup code and the lockfile creation code are both protected by another application-wide lockfile (which never needs to be deleted).

        Caveat: I haven't tried any of this out! I hope that my theoretical mullings will be of some use anyway :-)

        --
        use JAPH;
        print JAPH::asString();

        In situations where I want to lock a "critical area" (directory) but would rather not create an actual lock file, I've just locked the directory itself. Something like this:
        my $countfile = './COUNTFILE'; # Not diropen()... plain old "open" on a directory: open(DIRLOCK, '.') || die "open .: $!"; flock(DIRLOCK, LOCK_EX) || die "Lock failed: $!"; sysopen(COUNT, $countfile, O_RDWR|O_CREAT) || die "open $countfile: $!"; my $num = <COUNT> || 0; chomp $num; seek(COUNT, 0, 0) || die "rewind: $!"; truncate(COUNT, 0) || die "truncate: $!"; if ($operation eq 'checkin') { ++$num; warn "Checkin $num"; } else { --$num; warn "Checkout $num"; } if ($num) { print COUNT $num; } else { # count is 0; get rid of the file unlink $countfile; } close(DIRLOCK) || die "close .: $!";
        Opening a directory as a regular file and locking it is an odd technique I suppose, but it's nice as there's nothing to create and nothing to clean up afterward. I await replies explaining why this is a terrible thing to suggest.

        Having a single lock for many independent counts seems like a shame.

        Suppose when you "deleted" the count file you first wrote -1 to it. Then any processes that have it open, will later read -1, and can drop the file and loop round to open/create the new instance.

        Should you care, Winders doesn't appear to let you unlink a file you have open.

        Why are you afraid of having hundreds of tiny files?

        However, if you have that many different critical sections, you might want to consider alternatives. Shared memory, or just use a transactional database to keep track of your number of customers.

Re: File Locking plus delete Lockfile question
by ruzam (Curate) on Feb 12, 2009 at 14:40 UTC
    I think that a dotlock would server you better than flock here.
    ... my $dotlockfile = $lockfile . '.lock'; my $lfh; # create the .lock with O_EXCL (Fail if the file already exists) while (!sysopen($lfh, $dotlockfile, O_RDWR|O_CREAT|O_EXCL)) { # some timeout checking code to get out this loop ... # wait for the existing .lock to go away sleep(1); } die "Can't create $dotlockfile" unless $lfh; close($lfh); # we got our own .lock, now continue with your lock code # no need to flock here my $number_of_clients = 0; sysopen(LOCKFILE,$lockfile,O_RDWR|O_CREAT) # create if necessary or die "Can not open/create $lockfile ($!)"; eval { $number_of_clients=<LOCKFILE>||0; chomp($number_of_clients); seek(LOCKFILE,0,0) or die "Rewind error on $lockfile ($!)"; truncate(LOCKFILE,0) or die "Truncate error on $lockfile ($!)" +; if($operation='checkin') { ++$number_of_clients; ... } else { ... --$number_of_clients; } print LOCKFILE $number_of_clients,"\n"; } if($@) { warn "Exception: $@"; } close(LOCKFILE) or warn "Error closing $lockfile ($!)"; # delete if necessary (our .lock will keep others at bay) if ($number_of_clients <= 0) { unlink($lockfile) or die "Unlink error on $lockfile ($!)"; } # now release our .lock unlink($dotlockfile) or die "Unlink error on $dotlockfile ($!)";
    Note: untested (I hate testing locking code)

      Is there a particular reason why you use O_EXCL in the sysopen of $dotlockfile over just opening and than doing an flock? Is there a performance penalty, or an issue of safety, of one compared to the other?

      -- 
      Ronald Fischer <ynnor@mm.st>

        The O_EXCL (Fail if the file already exists) option causes sysopen to fail if the file already exists (surprise). This lets you attempt to create the file (O_RDWR|O_CREAT), but prevents you from taking over the file if it already exists (someone else has already created it), making it an effective lock for situations where you just can't use (or trust) flock.

        Without it, one process would create the file, and the next process would simply open it again without waiting for it to be removed. In your example, the initial lock file still maintains counts of clients but since it's now being locked by the .lock file, there's no need to flock it anymore. The new .lock file restricts access to your initial lock file so clients can be sure that files don't change out from under them.

        In the "timeout checking code" bit (which I didn't include) you should probably check for things like dead lock files. If your system crashes after the .lock file has been created and before it's removed, then the file will sit around for ever locking out your application. Usually you want to check the mod time of the .lock file and remove it if it's older than some arbitrary time limit (which will depend on your application needs). You probably also want to exit the loop (or die) if you've waited too long to open the .lock file.

Re: File Locking plus delete Lockfile question
by bluto (Curate) on Feb 12, 2009 at 17:55 UTC
    A ++ for thinking about lock deletion. This is a basic concurrency issue that many programmers I've known fail to grasp. Some just ignore the problem since it may not occur often and it didn't occur during their tests. I've seen others waste a lot of effort writing and rewriting their code trying to somehow eliminate this race condition by sheer force of will.

    In the general case you must know for sure that a lock cannot be accessed before you try to destroy it. As others have mentioned in general you need a higher level lock, or you need a special case (e.g. the machine is starting after a reboot and the app is known to have not started yet; the app is starting/completing and it knows it has no other threads running; etc). The special case may not work very well if your app is designed to run for weeks/months at a time.

      Hasnt anyone in the Perl community read the man page for POSIX unlink() function!! Perl' implementation is wrong. If it were right it would be easy to solve this issue.

        Hasnt anyone in the Perl community read the man page for POSIX unlink() function!! Perl' implementation is wrong. If it were right it would be easy to solve this issue.

        Seeing how perl just calls the system unlink function, that can't possibly be true

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://743274]
Front-paged by Arunbear
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2018-01-17 00:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How did you see in the new year?










    Results (194 votes). Check out past polls.

    Notices?