|Perl: the Markov chain saw|
Re: How do I start a long process with a short visit to a URL?by ELISHEVA (Prior)
|on Oct 07, 2009 at 10:17 UTC||Need Help??|
Please be very careful with any URL that can trigger memory or CPU intensive processes. The web-based cron that I've seen comes in two flavors:
If you go with the first solution you must be very sure that only that web-cron provider can trigger the script. Ideally this should be done both on the web server level (via well configured .htaccess files) and checks internal to your script. If you are not very careful, you can open yourself up to DOS attacks. You don't need to be a known target to be vulnerable. There are non-so-nice crawlers and script kiddies out there that will canvas random websites looking for vulnerable URLs and when they have found them, they "play" until your site croaks.
The on-site cron approach tends to be less risky because it knows its own schedule and won't rerun a job after it has already been run. The main down-side of that approach is that timing is never precise. If you schedule a process for 2AM but nobody visits your site until 7:30AM the process will run at 7:30AM, not 2AM. This is obviously a problem if you need something to run at exactly 2AM.
Depending on your traffic patterns you may also experience load balancing problems. Presumably one schedules a resource intensive task at 2AM because it is a low traffic period. If nobody visits in the middle of the night and you tend to get a burst of traffic in the morning, the task scheduled at 2AM may end up running at a peak traffic period rather than the low traffic period you intended.
Perhaps the best solution is a combination - use method 2 (a carefully secured and unpublished website based cron script) to check schedules and trigger tasks. Use the external cron service to make a totally innocent URL request (e.g. http://example.com/index.html) at a specific time. The request just happens to trigger the cron script which in turn triggers the expensive process if it hasn't run yet.
WordPress has a fairly mature plug-in (WP-Cron) that you might want to look at to give you ideas about how to write a scheduler that is triggered by HTTP requests. It is written in PHP, of course, but studying it might be useful for ideas about handling security issues, corner cases and design details.