Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

CGI daily 'cleanup' task

by twerq (Deacon)
on Aug 28, 2002 at 13:22 UTC ( #193444=perlquestion: print w/replies, xml ) Need Help??

twerq has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I'm building a CGI::Application based website which requires some special functionality outside the normal 'run-modes'. It's basically a cleanup routine, but I'm not sure the best way to fire it so that it only gets run once (daily, or thereabouts), and I'm trying to avoid a system scheduler (cron, or the like).

Ideally, I'd like something like this:
  • Every invocation of the CGI has a 33% chance of forking a 'cleanup' child process
  • If we become a cleanup process, somehow make sure we're the only one (-- how can I do this?)
  • Cleanup process checks to see if there are any tasks past due, takes care of them and quietly exits.

    ...but this approach seems pretty ghetto, and I'm sure I'm not the first person to try something like this. SuperSearch isn't yielding much, but maybe I'm using the wrong search terms.

    Thanks in advance,

    --twerq
  • Replies are listed 'Best First'.
    Re: CGI daily 'cleanup' task
    by rob_au (Abbot) on Aug 28, 2002 at 13:37 UTC
      This discussion has been brought up before on this site with discussions focusing on the merits of separate 'clean-up' scripts running as crontab or scheduler tasks - See unlinking old files for one such discussion.

      However, depending upon your CGI::Application application, you may want to take advantage of the teardown method which is specifically designed for final clean-up tasks - This method is described in my review of the CGI::Application module here. It may be worth investigating the usage of this method in combination with a persistent information store like Cache::Cache or locking mechanism for implementation of such scheduled clean-up tasks.

       

    Re: CGI daily 'cleanup' task
    by talexb (Chancellor) on Aug 28, 2002 at 13:44 UTC
        Every invocation of the CGI has a 33% chance of forking a 'cleanup' child process
      Wow .. 33%. That's pretty high. After three accesses, the odds are even that you will have run the cleanup. Your choice ..
        If we become a cleanup process, somehow make sure we're the only one (-- how can I do this?)
      You can use a lock file.
      • Does the lock file exist? If so, someone else is already doing a cleanup.
      • Create a randomly name lock file. Do more than one of them exist? If so, continue only if yours was the first one created; otherwise, delete the lock file .. someone else is already doing a cleanup.
      • Do your cleanup, delete the lock file.
        Cleanup process checks to see if there are any tasks past due, takes care of them and quietly exits
      This is duplicating what cron does, so you may as well use their format to store tasks. I had a quick look at CPAN but didn't find anything that will handle cron format task list, but that's probably a good way to go, depending on the complexity of the task timing.

      --t. alex
      but my friends call me T.

        Does the lock file exist? If so, someone else is already doing a cleanup.

        This seems like a good solution for making sure I'm the only agent running...

        Wow .. 33%. That's pretty high. After three accesses, the odds are even that you will have run the cleanup. Your choice ..

        Let me clarify -- that was just a number I quickly threw out there, but basically I was thinking that there should be a 33% chance that the script will check to see if it needs to run a cleanup process. Which is something like:
        if (33% chance) { if (it has been a day or more since last cleanup) { // Do our thing } }
        Although now I'm going to have to leave another little file lying around letting my script know when the last cleanup happened. . . starting to seem pretty hacky. Maybe I should just use a scheduler :)

        FYI -- I'm building this web application on trusted servers, but have no idea where it's going to be run once I'm done, which is why I'm trying to keep away from relying on cron, and relying on sysadmins to set things like that up properly.

        also -- the cleanup is actually going to be doing post-due transactions from a DB.
        --twerq
          It sounds like you want a mechanism whereby you do some cleanup if it's been at least 'n' hours since the last cleanup, or if it's a new calendar date since the last cleanup, or if a cleanup's never been done.

          It makes sense not to rely on cron if you don't know where the application is going to be rolled out -- I was only suggesting sticking to the cron format if there was a module that handles that already. Otherwise, go crazy and write your own protocol.

          --t. alex
          but my friends call me T.

    Re: CGI daily 'cleanup' task
    by snafu (Chaplain) on Aug 28, 2002 at 13:54 UTC
      Hmm. Well, if I had to do this, and I have to come clean now, I know nothing of CGI::Application (I will check it out in a few minutes), so coming from this angle, I'd do:

    • If we become a cleanup process, somehow make sure we're the only one (-- how can I do this?)
    • I'd do this one or both of two ways. 1) I'd have the code create a pidfile (assuming it doesn't fork so there is no parent to worry about if dealing with a child process). Once the process ends it would remove the pid file. I'd use that pidfile to see if there is a process running already. I would of course not only check if the pidfile exists but I'd check for the pid that is contained in that pidfile to see if it is a valid running pid.
      2) I'd simply check the process table to see if a pid exists for the program that I need to check if running.

    • Cleanup process checks to see if there are any tasks past due, takes care of them and quietly exits.
    • Ok. Now, IIUYC (if I understand you correctly =), you want to try and schedule the clean up without using cron. Well, according to what you stated, why can't you create a tmp file someplace that your script checks for overdue tasks that need to be cleaned up? Tasks get placed in the tmp file with a timestamp. You would check the timestamp of the completed task and then use a time table for when that task becomes overdue for cleaning. Thus, if a task completed at 0530 HRS and your table says that for *that* particular task cleanup should be done 4 hours afterward, then your script is smart enough to know that at 0930 HRS cleanup for that task should be performed. Once the cleanup is completed, don't forget to take that entry out of the tmp file.

      You can also simply place something in the code that makes the script do the cleanup after a certain time per day or a certain day of the week. Meaning, you place something in the script or a config file saying that cleanup must not be done until 0200 HRS everyday. The script kicks off all day long every hour but it won't perform the cleanup until sometime between 0200 and 0300 of the current day.

      Anyway, I hope I understood your questions. If I were in your shoes and I understood your questions properly, these are a couple of ways I would have thought about tackling the chore.

      Generally, I prefer to stay away from temp files but if this program isn't a daemon then I believe temp files would be the best route to go.

      _ _ _ _ _ _ _ _ _ _
      - Jim
      Insert clever comment here...

    Re: CGI daily 'cleanup' task
    by Zaxo (Archbishop) on Aug 28, 2002 at 14:08 UTC

      How did you arrive at the 33% figure? Why do you think a probabilistic solution is superior to a scheduled or a metric-based cleanup?

      You don't say what you want to clean exactly, so I can only give general advice. It will make a difference if you are running mod_perl or not.

      Use a system scheduler if you are recovering time-expired resources, like session files whose cookies are now invalid. For scheduled one-shot jobs, there is 'at' and it's cousins. Reserve 'cron' for regularly scheduled jobs. I'd be suspicious of having many processes rewriting crontab on the fly.

      For a metric based solution, measure what resources may need to be released, and fork a cleanup only if the measurement is above some threshold.

      For your proposed solution, ..manymonk..'s suggestion of using a lock file is good. Use sysopen and its mode flags to control locking behavior.

      After Compline,
      Zaxo

    Re: CGI daily 'cleanup' task
    by cfreak (Chaplain) on Aug 28, 2002 at 14:08 UTC

      I'm going to agree with general consensus that some kind of temp lock file is the way to go about handling this. I would like to know, however, why can't you use cron? Cron jobs can run as your username and its really quite simple.

      Chris

      Lobster Aliens Are attacking the world!
    Re: CGI daily 'cleanup' task
    by dws (Chancellor) on Aug 28, 2002 at 23:13 UTC
      I'm trying to avoid a system scheduler (cron, or the like).

      Can you say more on why you want to avoid using a scheduled cleanup process?

      I do my periodic cleanup on two sites (on FreeBSD) via cron, and it works very well. However, I've had miserable luck on Win32 systems (WinNT, Win2K) getting the scheduling service to reliably run a task.

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: perlquestion [id://193444]
    Approved by rob_au
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (2)
    As of 2022-06-25 01:55 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      My most frequent journeys are powered by:









      Results (81 votes). Check out past polls.

      Notices?