Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Frustration with changing lock type

by demerphq (Chancellor)
on Apr 14, 2004 at 11:05 UTC ( #344984=perlquestion: print w/ replies, xml ) Need Help??
demerphq has asked for the wisdom of the Perl Monks concerning the following question:

I have some code that manages a lockfile mechanism used to keep various processes from working on the same things at once. The basic idea is that before a process starts a task it tries to get a lock on the task id. If it cant get the lock it tries to work on something else. What i wanted to do was to have it work like this:

  1. Open the lock file
  2. Try to get an exclusive lock in nonblocking mode.
  3. Unless we get it go to 8.
  4. read the lockfile to see if it was abandonded by another process. Print whatever the situation is.
  5. Write our details to the lockfile
  6. Change from an exclusive lock to a shared lock so that other people can see its held by us.
  7. Goto 10.
  8. If we didnt get the exclusive lock then get a shared lock in blocking mode (wait until we can get a shared lock)
  9. Read and print the info about the process who has created the lock.
  10. Finish

But what happens is that unless I add a step 5.5 which is "Unlock the file" before we "Lock the file in shared mode" the lock status doesnt change from LOCK_EX to LOCK_SH. Thus other process sit waiting for ever. If I do put in the LOCK_UN step then it seems to work, but I cant help but feel there is a race condition involved if I do.

Anyway the code is below (Thanks to tilly for the OnDestroy trick or at least the idea that underlies it.)

use Fcntl ':flock'; use POSIX; sub iso_time { POSIX::strftime ("%Y-%m-%d %H:%M:%S",localtime($_[0]||t +ime)) } sub OnDestroy::DESTROY { shift(@_)->() } sub OnDestroyDo(&) { bless shift(@_),"OnDestroy" } sub get_lock { my ($lock_dir,$name)=@_; my $debug=1; $name=~s/[^-\w.#!\@~=+%\$]//g; my $lockfile=catfile($lock_dir,$name.'.lock'); print "Trying lockfile $lockfile\n"; sysopen(my $FH, $lockfile, O_RDWR | O_CREAT) or do { warn "can't open $lockfile: $!" if $debug; return; }; # autoflush $FH select( (select($FH), $|++)[0] ); my ( $time, $process, $lname ); if (flock( $FH, LOCK_EX | LOCK_NB )) { ( $time, $process, $lname )=split /\|/,join "",<$FH>; seek $FH, 0, 0 or die "Failed rewind:$!"; if ($debug) { if ($process) { print "\tLockfile appears to be abandonded by Process +#$process started at $time\n" } else { print "\tLockfile appears to be unprocessed\n" } } my $lock_msg=join("|", iso_time(), $$, $name)."\n"; print "Locking $lockfile : $lock_msg"; print $FH $lock_msg; truncate($FH, tell($FH)) or die "Failed to truncate:$!"; # if I remove this it doesnt work on my Win2k box flock($FH, LOCK_UN) or die "sharedlock: $!"; # but if I leave it in then it seems like a race condition is +possible. flock($FH, LOCK_SH|LOCK_NB) or die "sharedlock: $!"; return OnDestroyDo { print "\tFinished with and removing $lockf +ile\n"; close $FH or die "Failed to close \$FH:$!" +; unlink $lockfile or die "Failed to unlink +$lockfile\n"; undef $FH; }; } elsif (flock($FH, LOCK_SH|LOCK_NB)) { ( $time, $process, $lname )=split /\|/,join "",<$FH>; print "\tLockfile $lockfile appears to be locked by Process #$ +process at $time\n" if $debug; } else { print "Failed to get lock on $lockfile, not sure why.\n"; } return }

Thanks in advance for any help you might be able to offer.


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi


Comment on Frustration with changing lock type
Select or Download Code
Re: Frustration with changing lock type
by tachyon (Chancellor) on Apr 14, 2004 at 11:26 UTC

    Am I missing the point or are you over complicating the problem. LOCK_EX will block until it gets the lock unless you LOCK_EX |LOCK_NB to make it non blocking. If you don't get a lock and wander off to do other code you will surely be wasting time unless you have a signal to indicate the lock is now available, exit your current code, LOCK_EX, doit() and return to what you were doing. Even if you do this is a race. Why not just block for lock or ask for a LOCK_EX|LOCK_NB and if you don't get it loop, sleeping 1 retying until you timeout or get lock?

    Given that a lock write will take milliseconds you can calculate that the statistical probability of getting a lock on the first pass is N% so for each retry your probability of failure is (1-N%)**retries. In other words a wait for lock strategy will slow you app by a fraction of a percent overall on average but the alternatives are.....

    cheers

    tachyon

      Am I missing the point or are you over complicating the problem.

      Yes. :-) The exclusive lock means that we can do our task. The LOCK_EX is nonblocking so that if another task has it LOCK_SH it means the task is in process. The follow up LOCK_SH is meant to be blocking so that we dont try to read the tasks owner until it has finished writing to the lockfile.

      Why not just block for lock or ask for a LOCK_EX|LOCK_NB and if you don't get it loop, sleeping 1 retying until you timeout or get lock?

      Because I wanted to know the status and owner of the previous attempt basically. I know its not a good reason but I wanted to know, and things didnt work as I expected. :-)


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


Re: Frustration with changing lock type
by hv (Parson) on Apr 14, 2004 at 11:30 UTC

    Hmm, a quick test shows that I can quite happily downgrade a flock()-style lock on my Linux system without first releasing it. However perldoc -f flock shows a lot of O/S-specific (or rather, emulation-specific) caveats, and I imagine you're getting a different emulation under Win32.

    One approach would be to append your process data to the file while you have the exclusive lock, then after unlocking and reacquiring a shared lock verify that yours is still the last line in the file. If it isn't, you must then unlock and go back to the beginning again.

    With this approach you might also want to reduce the risk of runaway on a busy system by checking the last modified time and sleeping for a second or two before trying to acquire the exclusive lock if it has only just changed.

    If you get the shared lock while your process is still the last named, you can then truncate the file back to containing just one line.

    Hugo

      Thanks for the reply Hugo. I went with something like what you suggest. Added a test before we try to aquire the LOCK_EX to make sure the lockfile doesnt exist or is at least 3 seconds old, and double checking that the process data stays the same after the LOCK_UN and LOCK_SH. The append logic I didnt quite follow so I didnt go that way. Here is what I have:

      Yes its a bit wordy, id be interested to see a less clunky implementation if anyone feels inclined.

      Incidentally there is no need to wait to get the exclusive lock. Under normal situations this will be polling a table in a DB every 60 seconds or so, and then processing any records it finds there. So if the lock file is too young returning to try the next record is better.

      Cheers


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


        Hmm, I think you still have a race in the OnDestroy: if I understand flock() correctly, the lock is released as soon as you close the filehandle, so the unlink() happens after the lock has been released.

        If I remember right, Win32 doesn't let you unlink a file while anyone still has a handle open on it, so maybe it would be better instead to truncate the file to zero bytes on completion, which is something you can do with an open filehandle.

        Hugo

Re: Frustration with changing lock type
by crabbdean (Pilgrim) on Apr 14, 2004 at 12:20 UTC
    Yeah, there is a race condition there but in that race if another process grabs the file wouldn't it then have the lock, putting the other one into wait mode? Personally I'd avoid the LOCK_SH and just go for LOCK_EX, write what you need and release (LOCK_UN). You're talking a millisecond of time really.

    I once wanted to test such a thing so I created a quick module that was called by a script to lock, and release a DB database. I then proceeced to run about 6 copies of my script at once. They went crazy locking and unlocking and only had to wait for lock every few hundred goes, and even then it was only for a fraction of a second. I was all rather impressed at the time seeing 6 copies trying to grab it a few hundred times a second. Anyway, the moral is ... well you get the gist.

    Dean
    The Funkster of Mirth
    Programming these days takes more than a lone avenger with a compiler. - sam
    RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers

      Well speed isnt the issue here so much. The tasks being performed are mostly file moves/ftp and transformations of large files, so the actual taks could take minutes to complete or possible longer in some circumstances. I know I cold have gone with a simpler design if I didnt want to know who owned the task already, but once you set yourself a goal....

      Cheers for the reply.


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://344984]
Approved by broquaint
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2014-08-21 13:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (135 votes), past polls