http://www.perlmonks.org?node_id=985288

pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks!

I have a script that runs often, is run by lots of different processes, and takes a long time. It is basically testing the environment to see if whatever called it can proceed.

I should be able to cache the results in some way, but I want to make sure I do the caching properly.

There is no "Key" for this cache. All answers are the same. A database is not appropriate, so I need to handle any atomic/concurrency stuff.

  1. If the cache is fresh, read it and go away
  2. If the cache is stale...
    1. Try to get exclusive lock
      1. Got lock? Do tests, write results to cache, go away
      2. Didn't get lock????

This is where it gets more complicated. Do I wait around for whoever has the lock to finish? Do I wait only if the locking process is doing the tests? How do I handle a situation where a locking process has wandered off into oblivion and hasn't released the lock?

I'm on linux and I'm thinking with flock, but I'm open to non-flock ideas.

Replies are listed 'Best First'.
Re: Concurrent Cache Pattern
by BrowserUk (Patriarch) on Aug 03, 2012 at 23:55 UTC

    Kiss!

    • Write the "answer" as a directory name at a particular place in the file system.

      Say /usr/shared/appname/cache/xxxxxx/

    • Applications reading the data simply glob the path /usr/shared/appname/cache/*.

      They get back a single name that is the answer to the question.

      If the readers need to be able to tell how fresh the answer is, they can stat the modified time.

      No locking involved.

    • The "writing" script, uses the atomic nature of rename to update the directory name at the appropriate times.

      No locking involved.

    If the answer is too big to fit into a directory name, then use a file with a well-known name.

    • Readers just read the contents.
    • The writer writes updates to a different name and then deletes the original and renames.

      This requires the readers to retry if their attempts to open or read fail.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      This is an awesome answer. Exactly what I was hoping for.

Re: Concurrent Cache Pattern
by flexvault (Monsignor) on Aug 04, 2012 at 10:40 UTC

    pileofrogs,

    Because of your concern about a 'flock' on the lock from a runaway process, you can use 2 files, one for the locking and one for the information your caching. I'm going to show untested code to give you an idea of how to go about it, but only you can decide what to do in a true deadlock situation.

    my $lock_file = "./LockFile"; ## Put this in a common place GetLock: if ( ! -e $lock_file ) ## No Process is working { open ( my $lock, ">", $lock_file ) or die "Can't open $! "; if ( flock ( $lock, LOCK_EX | LOCK_NB ) ) { print $lock "$$\n"; close $lock; ## Now, you need to make sure your the process with the lock open ( $lock, "<", $lock_file ) or die "Can't open $! "; if ( flock ( $lock, LOCK_SH ) ) { my $pid = < $lock >; chomp ( $pid ); close $lock; if ( $pid eq "$$" ) { last; } ## Good--you have the lo +ck else { redo GetLock; } } else { redo GetLock; } } else { ## Another process has the exclusive lock close $lock; usleep 1_000; ## wait .001 seconds redo GetLock; } } elsif { if ( -s $lock_file ) ## Another Process is already w +orking { open ( my $lock, "<", $lock_file ) or die "Can't open $! " +; my $who = < $lock >; chomp ( $who ); close $lock; ## Now you can stat the file and if it exists longer than ## some number of seconds/minutes, you have the pid to k +ill it. } else { redo GetLock; } } ## Put your testing and cache here.

    The idea is to verify you have the lock before starting the cache process. Once you are sure you have the lock, you can test the cache file to determine if it is fresh or stale. When you finish your work you 'unlink' the lock file.

    Things to think about:

    • Do you need a counter on how many times to 'redo GetLock'
    • Do you need to issue an warning email/text message for deadlock.
    • Instead of using 'die' to have a common routine to issue warning message to the system admin.
    • Maybe BrowserUk's solution is better!

    Good Luck!

    "Well done is better than well said." - Benjamin Franklin

Re: Concurrent Cache Pattern
by Anonymous Monk on Aug 03, 2012 at 18:17 UTC

    I'm on linux and I'm thinking with flock, but I'm open to non-flock ideas.

    Um, use a database :) use CHI, name each step (think $step++)

Re: Concurrent Cache Pattern
by sundialsvc4 (Abbot) on Aug 04, 2012 at 17:58 UTC

    Would it be remotely possible to designate one simple script, on one central server, to be the arbiter?   A process would, let us say, open a socket connection to this script, wait to read one byte from it, do its work and then close the socket.   This one script could be an efficient traffic cop for the whole procedure since it would be in the position to know everything that is going on.   I can’t quote an implementation off the top of my head, but I am sure there must be such a thing.

    Or maybe (and I say this in all seriousness, BrowserUK could whip one up in about three minutes ...