Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Please be very careful with any URL that can trigger memory or CPU intensive processes. The web-based cron that I've seen comes in two flavors:

  • a provider that lets you submit a schedule and sends a URL request at a specified time.
  • a script on your own website checks a scheduling file each time someone makes a web-request. If it finds any tasks scheduled before the present moment, it runs them if they haven't been run already.

If you go with the first solution you must be very sure that only that web-cron provider can trigger the script. Ideally this should be done both on the web server level (via well configured .htaccess files) and checks internal to your script. If you are not very careful, you can open yourself up to DOS attacks. You don't need to be a known target to be vulnerable. There are non-so-nice crawlers and script kiddies out there that will canvas random websites looking for vulnerable URLs and when they have found them, they "play" until your site croaks.

The on-site cron approach tends to be less risky because it knows its own schedule and won't rerun a job after it has already been run. The main down-side of that approach is that timing is never precise. If you schedule a process for 2AM but nobody visits your site until 7:30AM the process will run at 7:30AM, not 2AM. This is obviously a problem if you need something to run at exactly 2AM.

Depending on your traffic patterns you may also experience load balancing problems. Presumably one schedules a resource intensive task at 2AM because it is a low traffic period. If nobody visits in the middle of the night and you tend to get a burst of traffic in the morning, the task scheduled at 2AM may end up running at a peak traffic period rather than the low traffic period you intended.

Perhaps the best solution is a combination - use method 2 (a carefully secured and unpublished website based cron script) to check schedules and trigger tasks. Use the external cron service to make a totally innocent URL request (e.g. http://example.com/index.html) at a specific time. The request just happens to trigger the cron script which in turn triggers the expensive process if it hasn't run yet.

WordPress has a fairly mature plug-in (WP-Cron) that you might want to look at to give you ideas about how to write a scheduler that is triggered by HTTP requests. It is written in PHP, of course, but studying it might be useful for ideas about handling security issues, corner cases and design details.

Best, beth


In reply to Re: How do I start a long process with a short visit to a URL? by ELISHEVA
in thread How do I start a long process with a short visit to a URL? by Cody Fendant

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (10)
    As of 2014-07-11 06:40 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      When choosing user names for websites, I prefer to use:








      Results (220 votes), past polls