Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Mechanism for ensuring only one instance of a Perl script can only run?

by redapplesonly (Sexton)
on Dec 02, 2022 at 16:03 UTC ( [id://11148499]=perlquestion: print w/replies, xml ) Need Help??

redapplesonly has asked for the wisdom of the Perl Monks concerning the following question:

Hola Perl Monks,

So I have a Perl (v5.30.0) script that is run every five minutes via a crontab on my Ubuntu server. When I first wrote the script, I thought five minutes would be plenty of runtime. But recently, I've discovered that the script may require *more* than five minutes to complete execution. This is a problem.

Right now, say my script is executed at 12:00. It does reads and writes to disk and to the network. But when the clock strikes 12:05, the script is not finished. It keeps going. But then cron runs a second instance of the script. Now the 12:00 and 12:05 scripts are both reading and writing the same data to disc & network. They complicate each other's work, data is contaminated, and the mess is still getting worse when the 12:10 instance of the script arrives on the scene. Mass hysteria follows.

So now I need a mechanism where my script basically makes sure there are no other instances of itself running. Pseudocode would be:

Script starts. Am I the only instance of this code? -- If YES, continue -- If NO, terminate immediately

This can't be that hard to do. That said, whenever I think about it, I come up with fairly clunky solutions. Below is code which technically works... but is a pretty stupid solution. Basically, the script creates an empty file (I'm calling it a "Marker File") at the start of execution, then deletes it at the end. If another instance of the script starts and sees that file, it realizes that its not alone and terminates.

This will work... but I kinda superhate it. It takes 20 lines of code to do something that seems pretty basic. Can you recommend a more elegant solution? There's gotta be a way. Thank you.

My code:

#!/usr/bin/perl use warnings; use strict; my $MARKER_FILE="/home/me/MARKERFILE"; sub checkMarker { if(-e $MARKER_FILE){ # Marker file already exists, another instance of this + script is running! return undef; # FALSE } else { # Marker file doesn't exist! We should create one... my $cmd = "touch $MARKER_FILE"; `$cmd`; return 1; # TRUE } } package main; if (checkMarker()) { print "I can do stuff!\n"; # ...do stuff... my $endCmd = "rm $MARKER_FILE"; `$endCmd`; } else{ print "Another instance of this script is running, I can't run +...\n"; }

Replies are listed 'Best First'.
Re: Mechanism for ensuring only one instance of a Perl script can only run?
by stevieb (Canon) on Dec 02, 2022 at 16:41 UTC

    I wrote Script::Singleton exactly for this task.

    It uses the shared memory functionality of IPC::Shareable which I also maintain.

    To use it, you literally only need to use it:

    use Script::Singleton;

    ...done.

    It uses the script's path and filename as the "glue" aka. shared memory key for identification. If you want to use a custom glue:

    use Script::Singleton glue => 'UNIQUE GLUE STRING';

    Best part I like about it is that if there's a system interruption or reboot, you don't have to worry about lock files hanging around.

    Here's an example (using the module with the warn parameter which is false by default):

    use warnings; use strict; use Script::Singleton warn => 1; sleep 5;

    Run the script in one CLI, and within five seconds, run it in a second window and you'll get:

    Process ID 24891 exited due to exclusive shared memory collision at segment/semaphore key '0x40d7106c'
Re: Mechanism for ensuring only one instance of a Perl script can only run?
by Corion (Patriarch) on Dec 02, 2022 at 16:43 UTC

    See the highlander script. The basic idea is to flock() your program script. That way, the lock is automatically released once your program ends (or crashes).

      Thanks Corion! So one caveat I have is that if a newly-executed script realizes that there's another instance already at work, I want the new script to terminate immediately, not enter a queue to run later. It looks like flock() doesn't offer a "terminate immediately" option... or am I missing the bigger picture. Thank you so much for writing.

        No. flock returns a false value if it cannot lock a file because it is already locked. The idea is that you then exit your program:

        use Fcntl qw(LOCK_EX LOCK_NB); my $scriptname = $0; # assuming that all script instances will be invo +ked the same way open my $script, $scriptname or die "Couldn't find myself?! Looked for '$script', but got $!"; if( !flock $script, LOCK_EX | LOCK_NB ) { print "We are already running\n"; exit 1; }; sleep 60
Re: Mechanism for ensuring only one instance of a Perl script can only run?
by LanX (Saint) on Dec 02, 2022 at 16:29 UTC
    Perl has flock command for this kind of problems.

    This recent discussion might be of interest: File lock demo

    please be aware that this also depends on the OS, I've tested it on Win.

    (you havent been explicitly telling us which one you use, but crontab is usually more *nix'ish)

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      Thanks Rolf! I'll look into your suggestion. Also, I edited my post to make it clear that I'm on a Ubuntu machine.
Re: Mechanism for ensuring only one instance of a Perl script can only run?
by eyepopslikeamosquito (Archbishop) on Dec 02, 2022 at 21:48 UTC
Re: Mechanism for ensuring only one instance of a Perl script can only run?
by karlgoethebier (Abbot) on Dec 02, 2022 at 16:48 UTC
    "…a more elegant solution?"

    You could take a look at your process table:

    use strict; use warnings; use Proc::ProcessTable; my $processes = Proc::ProcessTable->new; for ( @{ $processes->table } ) { …; }

    $_->pid and $_->cmndline might be what you want. See also

    «The Crux of the Biscuit is the Apostrophe»

      my $processes = Proc::ProcessTable->new; for ( @{ $processes->table } ) { …; }

      I don't think that will always work safely:

      • At least FreeBSD does not mount /proc any more, so Proc::ProcessTable will probably return no processes at all.
      • On a system using Linux Containers, you will see processes running in containers as processes on the host. That may cause false positives.
      • You introduce a TOCTTOU problem - by the time you have evaluated data from /proc, the situation may have changed dramatically.
      • On some systems (at least Linux), content of /proc may be edited, e.g. by assigning to $0.

      Trying to locking the executable should be free of race conditions (or else flock() would be severely broken) and should also work with soft and hard links, as the file is locked, not one of its directory entries.


      Quick assign to $0 demo:

      /root>perl -E 'say `cat /proc/$$/cmdline`' perl-Esay `cat /proc/$$/cmdline` /root>perl -E '$0="find me"; say `cat /proc/$$/cmdline`' find me /root>

      pstree on a host with about 10 containers (running Proxmox VE, both host and containers using Debian 11)

      Every lxc-start is a parent process of a container, every systemd that is a child of a lxc-start is the init process (pid 1) of a container, each of those systemds and all of their children are running in a container.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        "… return no processes at all."

        From README.freebsd-kvm of the current version of Proc::ProcessTable:

        "FreeBSD 5.X not mounting /proc by default. Procfs is vulnerable system and its use is not recommended in future. In addition, mapping processes space to /proc is not correct (at least, in 7 of 7 my freebsd servers with FreeBSD5 installed). So, I decided to write this code. This module works via the kvm system."

        And it should be possible to obtain the PIDs of the children of every loc-start command.

        «The Crux of the Biscuit is the Apostrophe»

        Proc::ProcessTable on FreeBSD stable/13 -- "/proc" is a plain empty directory here -- showed all the "xterm" processes using the "A cheap and sleazy version of ps" example; did not check any further.

        If anyone has suggestions for things I should check when the module may fail, I am all ears^Weyes.

Re: Mechanism for ensuring only one instance of a Perl script can only run?
by ikegami (Patriarch) on Dec 03, 2022 at 05:51 UTC

    Two comments:

    Your solution suffers from a race condition, and could allow two instances of the script to run at the same time.

    A couple of people mentioned using a non-blocking flock. That's a great simple solution, but it comes with a major caveat: It's doesn't work or it's unreliable on some file systems (notably NFS). Also, I don't know if it'll work on Windows.

Re: Mechanism for ensuring only one instance of a Perl script can only run?
by talexb (Chancellor) on Dec 04, 2022 at 16:14 UTC

    I wrote a module for client (so I can't share the code) to take care of this, because I wanted to deal with the issue you described, as well as the issue where a previous run failed -- because some cleanup might need to be done.

    I created an object based on the called script (I passed $0 in to the module) which then checked for a corresponding file in /var/run. If it existed, it meant that the script had been started previously. If the PID inside that file was still active, it meant the previous invocation was still running; if not, it meant that the script had crashed without 'unlocking' itself. And, of course, if the file didn't exist, I created it, using the PID of the currently running process.

    This approach has worked well over the last three years.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Mechanism for ensuring only one instance of a Perl script can only run?
by rizzo (Curate) on Dec 03, 2022 at 17:21 UTC
    Hi redapplesonly!

    You could use a pid file for that. This is, as far as I know, the mechanism commonly used by server software.

    At start up, Your script checks if there is already a file with a given name, usually yourscript.pid in a directory, usually /var/run. If there is one it may read the contents which would be the process id of the already running instance and print an error and exit.

    .

    If there is none, Your script would create yourscript.pid and write its pid to it. Before the script exits, it deletes the file.

      You could use a pid file for that. This is, as far as I know, the mechanism commonly used by server software.

      Um, yes, but that still has race conditions AND problems with reboots AND problems with programs crashing before they can remove their PID file.

      In short, PID files suck. Many people believe they are needed or at least useful, but they aren't. They were never a good idea, but nobody had a better solution for years. The better approach for managing service software is to have a monitoring process. AFAIK, djb's deamontools are the earliest solution for getting rid of PID files, and many other solutions copied that idea, including systemd. Yes, systemd may still create PID files for legacy reasons, but like with deamontools, they are no longer needed.

      For the Highlander problem ("there can be only one"), a PID file might work often, but there is no guarantee that it will always work. On Unix systems (Linux, BSD, ...), getting a lock on the executable is the most robust solution, as long as you stay away from non-native and networking filesystems. See Re^2: Mechanism for ensuring only one instance of a Perl script can only run?.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Maybe I'm overlooking something, but given the following example code, it is not clear to me where there could occur a race condition.

        #! /usr/bin/perl + use strict; use warnings; my $pidfile="testfile.pid"; my $scriptname = $0; my $pid="$$"; if(-e $pidfile) { open(PFH, '<', $pidfile) or die $!; $pid =<PFH>; close(PFH); print"$scriptname already running: pid is $pid"; exit; } open(PFH, '>', $pidfile) or die $!; print(PFH $$); close(PFH); # do the job + sleep 13; unlink($pidfile);
Re: Mechanism for ensuring only one instance of a Perl script can only run?
by tybalt89 (Monsignor) on Dec 06, 2022 at 22:32 UTC

    Here's one way

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11148582 use warnings; use Time::HiRes qw( sleep ); sub grabthelockorexit { my $lockfilename = '/tmp/d.11148582.lockfile'; # FIXME to your filen +ame open our $fh, '>>', $lockfilename or die "$! on $lockfilename"; use Fcntl qw(:flock); flock $fh, LOCK_EX | LOCK_NB or die "$$ exiting $!\n"; } # The following is just test code for (1 .. 9) { if( my $pid = fork ) { sleep 0.33; } elsif( defined $pid ) { sleep 0.1 + rand( 0.1 ); grabthelockorexit(); print "$$ got lock\n"; sleep 1; # FIXME body of code... print "$$ exiting\n"; exit; } else { die "fork failed" } } 1 while wait > 0; # wait until all children finish

    I'm just using fork for test purposes starting a new process at 3 per second. Just put the sub in your code and call it at/near the beginning and it will flock or exit. No need to write anything to the file or remove it after your code exits. You could also put the code from the sub in-line if you want to. If your processes crashes, no cleanup is required. Using '>>' will create the file if it does not exist or just open it if it does.
    This works fine on my ArchLinux, should work the same on your Ubuntu.

    Outputs (for sample run):

    79504 got lock 79505 exiting Resource temporarily unavailable 79506 exiting Resource temporarily unavailable 79509 exiting Resource temporarily unavailable 79504 exiting 79510 got lock 79511 exiting Resource temporarily unavailable 79514 exiting Resource temporarily unavailable 79510 exiting 79515 got lock 79516 exiting Resource temporarily unavailable 79515 exiting

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11148499]
Approved by marto
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-19 12:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found