Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: How do I start a long process with a short visit to a URL?

by ELISHEVA (Prior)
on Oct 07, 2009 at 10:17 UTC ( #799685=note: print w/ replies, xml ) Need Help??


in reply to How do I start a long process with a short visit to a URL?

Please be very careful with any URL that can trigger memory or CPU intensive processes. The web-based cron that I've seen comes in two flavors:

  • a provider that lets you submit a schedule and sends a URL request at a specified time.
  • a script on your own website checks a scheduling file each time someone makes a web-request. If it finds any tasks scheduled before the present moment, it runs them if they haven't been run already.

If you go with the first solution you must be very sure that only that web-cron provider can trigger the script. Ideally this should be done both on the web server level (via well configured .htaccess files) and checks internal to your script. If you are not very careful, you can open yourself up to DOS attacks. You don't need to be a known target to be vulnerable. There are non-so-nice crawlers and script kiddies out there that will canvas random websites looking for vulnerable URLs and when they have found them, they "play" until your site croaks.

The on-site cron approach tends to be less risky because it knows its own schedule and won't rerun a job after it has already been run. The main down-side of that approach is that timing is never precise. If you schedule a process for 2AM but nobody visits your site until 7:30AM the process will run at 7:30AM, not 2AM. This is obviously a problem if you need something to run at exactly 2AM.

Depending on your traffic patterns you may also experience load balancing problems. Presumably one schedules a resource intensive task at 2AM because it is a low traffic period. If nobody visits in the middle of the night and you tend to get a burst of traffic in the morning, the task scheduled at 2AM may end up running at a peak traffic period rather than the low traffic period you intended.

Perhaps the best solution is a combination - use method 2 (a carefully secured and unpublished website based cron script) to check schedules and trigger tasks. Use the external cron service to make a totally innocent URL request (e.g. http://example.com/index.html) at a specific time. The request just happens to trigger the cron script which in turn triggers the expensive process if it hasn't run yet.

WordPress has a fairly mature plug-in (WP-Cron) that you might want to look at to give you ideas about how to write a scheduler that is triggered by HTTP requests. It is written in PHP, of course, but studying it might be useful for ideas about handling security issues, corner cases and design details.

Best, beth


Comment on Re: How do I start a long process with a short visit to a URL?
Re^2: How do I start a long process with a short visit to a URL?
by Cody Fendant (Pilgrim) on Oct 07, 2009 at 23:41 UTC
    Thanks for that. Very useful.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://799685]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (13)
As of 2014-11-24 08:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (137 votes), past polls