http://www.perlmonks.org?node_id=518170

beppu has asked for the wisdom of the Perl Monks concerning the following question:

Today, I received an email from my friend:

That worker script I wrote for MI is working fine, but there's one problem. Sometimes, it takes too long to run, and the next cron job comes along and they double up.

My first thought was to write a file to disk (a lock) and remove it on exit - so the next process will know to die... but what happens if it quits (or is killed) without removing the lock? Then nothing runs...

Is there a CPAN module or example you can think of, this is so obviously a common os-level problem that already has a solution, right?

Unfortunately, I don't know the answer to this, so I've come to the monastery for help.

Replies are listed 'Best First'.
Re: Cron Jobs That Run For Too Long
by sgifford (Prior) on Dec 20, 2005 at 20:02 UTC
    On Unix, there are three common ways to solve this.

    The first way is to have a lockfile that's always around. When your script starts, it tries to lock that file with flock; if it's already locked it exits, otherwise it does its work. The OS will automatically remove the lock when the process exits, even if it crashes or is killed.

    The second way is to write the PID of the process into the lockfile. Then when another copy starts up, it checks if the lockfile exists, and if so it reads the PID and verifies that that PID is still running and is actually itself. If it finds that the process is no longer running, it just creates a new lock file and goes on. When the process finishes it removes the lockfile.

    The third way is to get a list of all processes, and see if any of those are executing that script. If so, exit.

    I generally favor the first of these, because it's easy and efficient.

      Just a note, adding to what sgifford said...

      I've seen some scripts around listing /proc to check if the process is still running. I prefer to use a kill 0 on the proccess...

      And I, personally, prefer the second way because it can be used for lockfiles even if the only place you can write the lockfile can't provide locking, or the locking isn't reliable (NFS)

      daniel
        But wait, it gets better again!

        You know how you can have that nifty __DATA__ block at the end of your script? It turns out you can lock that too :)

        I've used this a number of times and it works just great.

        #!/usr/bin/perl

        use strict;
        use Fcntl 'LOCK_EX', 'LOCK_NB';

        unless ( flock DATA, LOCK_EX | LOCK_NB ) {
            print STDERR "Found duplicate script run. Stopping\n";
            exit(0);
        }

        ...

        1;





        ### DO NOT REMOVE THE FOLLOWING LINES ###

        __DATA__
        This exists to allow the locking code at the beginning of the file to work.
        DO NOT REMOVE THESE LINES!

Re: Cron Jobs That Run For Too Long
by jdhedden (Deacon) on Dec 20, 2005 at 19:59 UTC
    I would recommend using Proc::Daemon. It has the capability to check if the process is already running. When a second version is started, it can do the check and then exit.

    Remember: There's always one more bug.
Re: Cron Jobs That Run For Too Long
by ikegami (Patriarch) on Dec 20, 2005 at 20:11 UTC
    My first thought was to write a file to disk (a lock) and remove it on exit - so the next process will know to die... but what happens if it quits (or is killed) without removing the lock? Then nothing runs...

    When a process dies, the OS removes any and all flock locks placed by it. Instead of checking for the existance of a lock file, check if there's a lock on it (after unconditionally creating the file).

    Checking for existance is flawed anyway. For example,

    Process 1 Process 2 ----------------------------- ----------------------------- die if -e $lockfile; die if -e $lockfile; open(my $fh, '>', $lockfile); # I think I hold the lock. open(my $fh, '>', $lockfile); # I think I hold the lock. . . .
Re: Cron Jobs That Run For Too Long
by perrin (Chancellor) on Dec 20, 2005 at 20:27 UTC
    See Proc::Pidfile. There are several similar modules on CPAN as well.
      I'm the person that posed the question to beppu ;)

      Thanks for the great answers, I'm glad I finally made it on to perlmonks!

      So... Proc::Podfile didn't work out for me too well. I couldn't seem to get it working right. I might have been using it wrong, or perhaps I just didn't understand how to use it from reading the docs.

      Then I found File::Pid, and I'm very happy. Unfortunately, it is not smart enough to release the pidfile if the script crashes, but that's not a big deal for my current needs.

      Here was the implementation I used. I decided to allow my script to get a --force flag, in case it was really important that the script run at a certain time (even if a crash happened and there's a pid file hanging around). Under normal circumstances, the script will exit if another instance is discovered.

      use File::Pid; # Check to make sure we are not already running my $pidfile = File::Pid->new({file => "/path/to/my.pid"}); # This next line gets the current pid in $pidfile # If nothing was running, we should get back our own pid my $pid = $pidfile->write; # now, die if the pid file cannot be opened, # or the pid running is not THIS INSTANCE # note: this can be overridden if $FORCE is true. die ("running process=".$pid.", my pid=$$\n") if (!$pid or ($pid != $$ + and !$FORCE)); # ... do a bunch of stuff for a long time $pidfile->remove or warn "Couldn't unlink pid file\n";
      somebox
        What what it's worth, I'd like to endorse the solution offered by adamk earlier in the thread. I had been handling a similar problem through the use of lockfiles, but it wasn't working out too well. I decided to give adamk's suggestion a try and it absolutely works a treat. It's also nice and neat and very easy to implement.

        Cheers,
        Darren :)

Re: Cron Jobs That Run For Too Long
by philcrow (Priest) on Dec 20, 2005 at 20:02 UTC
    I think a pid file in /var/run is the right approach, maybe because that's what we do. You can easily have the cron print to stdout that it is dying because it sees the pid file of an earlier process. That stdout goes to the email address of the cron owner who always (well usually) pays attention to such things.

    It is also problematic if your script dies without logging or otherwise informing you. It's not always possible, but you should try hard to make autononmous scripts log and generate nasty grams when things go wrong (by trapping signals, evaling things that could die, etc).

    Phil

Re: Cron Jobs That Run For Too Long
by shriken (Priest) on Dec 20, 2005 at 20:45 UTC
    ...or put your two crontab entries on one line: Use unix shell boolean "and" to force the shell to run multiple commands one by one...

    cd /foo/dir && ./runcmd_1.pl && cd /foo2/dir2 && ./runcmd_2.pl

    or write simple shell script...
    #!/bin/sh cd /foo/dir ./runcmd_1.pl cd /foo2/dir2 ./runcmd_2.pl
    and schedule the shell script. I prefer the later because your crontab doesn't get messy and you can include mucho comments in the shell script
      ...or put your two crontab entries on one line: Use unix shell boolean "and" to force the shell to run multiple commands one by one...

      As I read the problem, this does not make any sense. My understanding is that this is not an issue of multiple commands, but rather a single command which is run frequently enough that it can sometimes overlap itself. Like, say, a cronjob run once per hour that occasionally take 70 minutes to complete, so that cron is trying to start a new instance before the old instance is finished (which can be quite nasty).

      My personal preference in cases like this is to put a routine at the beginning of the script to check running processes for another process of the same name.
      ...or to change cron to run the task infrequently enough that it doesn't overlap. :-)