Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Database driven web content: live or tape?

by talexb (Chancellor)
on Jul 19, 2002 at 19:17 UTC ( [id://183394]=perlquestion: print w/replies, xml ) Need Help??

talexb has asked for the wisdom of the Perl Monks concerning the following question:

I am contemplating a Big Project and have spent the day thinking about the architecture. That's pretty cool.

My current predicament has to do with whether I should present the page information (piped through the very cool HTML::Template) live from the database or just generate the pages using a daily cron job.

The advantages of doing it live:

  • Pages are updated immediately, something that's fairly important for this application (yet another storefront).
  • No cron job to worry about: immediate feedback as to whether changes are correct or not.

The advantages of using static pages:

  • Page URLs are easy to nab -- therefore the web pages are easier to link to.
  • Way lower load on the database (not that I think that's going to be a problem).

Hmm .. after listing those pros and cons, it seems the obvious way to go is live. Does anyone else have thoughts on this topic?

--t. alex

"Mud, mud, glorious mud. Nothing quite like it for cooling the blood!"
--Michael Flanders and Donald Swann

Replies are listed 'Best First'.
Re: Database driven web content: live or tape?
by Aristotle (Chancellor) on Jul 19, 2002 at 19:35 UTC

    I've pondered this time and again. My conclusion is the advantages of the static version can be undone:

    • use PATH_INFO rather than query parameters in your script and your dynamically produced documents will have URLs as simple looking as the statically generated ones
    • use clever caching in your script and the database load will go down too; in fact you can probably cache output from after going through HTML::Template, making for close to static page performance

    Not hard to guess which choice I advocate for..

    ____________
    Makeshifts last the longest.
Re: Database driven web content: live or tape?
by rattusillegitimus (Friar) on Jul 19, 2002 at 19:36 UTC

    You might consider a mixture of live and memorex. ;) Use a caching mechanism like Cache::Cache, CGI::Cache, or Apache::Cache to reduce the server and database load. I don't personally have any experience with any of these yet, but I've seen them discussed in the Monastery and I'll bet merlyn has a column or two that would help.

    As far as the easy of linking to the pages, if you are using Apache and can modify its configuration, you might look into mod_rewrite. I've used it with great success in the past to map user- and link-friendly URLs to CGI URLs with complex query strings.

    -rattus

    __________
    He seemed like such a nice guy to his neighbors / Kept to himself and never bothered them with favors
    - Jefferson Airplane, "Assassin"

      Thanks, I think Cache::Cache may be just the ticket. I am hosted on pair Networks so do not have control over the Apache configuration, nor can I use mod_perl.

      --t. alex

      "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!"
      --Michael Flanders and Donald Swann

Re: Database driven web content: live or tape?
by dws (Chancellor) on Jul 19, 2002 at 19:41 UTC
    My current predicament has to do with whether I should present the page information (piped through the very cool HTML::Template) live from the database or just generate the pages using a daily cron job.

    There's a middle-ground option: Generate static pages whenever underlying data changes (or, get smarted and track dependencies so that you can generate a subset of the static pages whenever underlying data changes). You can still pretend to be fully dynamic by generating .shmtl pages, which will still get the benefit of Server-Side Includes (SSI) at page fetch time.

    Look into how MoveableType handles this. It's a "middle ground" solution.

      Or, another variant on this theme: use a 404-handler to generate a "missing page" as I did in one of my columns. The first hit automatically produces and caches the new page. All subsequent hits are "static". To invalidate a page, just delete it. The next hit recreates it with the new information.

      -- Randal L. Schwartz, Perl hacker

        Vignette does this exact thing. Works pretty well except the fact that they didn't release the source to thier Apace Module :-(. Nevery really thought about doin it with the Error handler and mod_perl.
        ++merlyn (like you need it) :-P

        An intellectual is someone whose mind watches itself.
        - Albert Camus
Re: Database driven web content: live or tape?
by gryphon (Abbot) on Jul 19, 2002 at 19:38 UTC

    Greetings talexb,

    Generally, I've found that the only reason anyone should consider not hosting pages "live" (a.k.a. dynamically generated) is for delivery speed to the user or system load. However, usually the load difference between live and static is minimal so long as the server isn't really old/slow.

    You mentioned my favorite templating system, HTML::Template. Used in connection with mod_perl, HTML::Template can cache templates, and mod_perl itself caches your scripts and called modules. So generally, a templating system's speed difference should be minimal.

    Go with the "live" option. I always have, and it's been a blessing many times over.

    -gryphon
    code('Perl') || die;

Re: Database driven web content: live or tape?
by vagnerr (Prior) on Jul 19, 2002 at 19:43 UTC
    You could potentialy have the best of both perl.com has an article about etoys' website. They used reverse proxying allowing them to cache pages as they are generated.

    ---If it doesn't fit use a bigger hammer
      ++

      I would definitely endorse this type of setup for a major project. There's little sense in building your own proprietary caching mechanism when HTTP already has one built into it.

      Create your application such that it builds dynamic pages, and makes use of HTTP headers to identify how long a resource should be cached (if it should be). Then funnel all of your inbound traffic through a fast caching HTTP proxy server.

      I have also seen approaches where the document root of the web server is actually an on-disk cache. A missing file would generate a call to a 404 handler which invokes the CGI or back-end process (maybe a "real" web server--behind another firewall maybe--that generates content) to generate the page (and perhaps caching it in the document root for future use by the web server).

      I did a similar thing for a previous employer. I wrote the original (and only to my knowledge) reverse failover/balance patch for Squid 2.3, we used that as the reverse engine and carefully constructed our Expires headers for various content. That made an enormous difference to our capabilities, not only did we have a load distribution algorithm which used each system according to its capabilities without any manual ratio setting, but we had utterly transparent failover, in-memory caching of everything using Squids algorithms which are very effective, it was language/web server agnostic (we were using PHP, perl and C in various areas) and a nice centralised place to pick up the logs to boot.

      It was a good day when I just shut down one of the web servers without warning and not a single user connection was lost.

Re: Database driven web content: live or tape?
by Hero Zzyzzx (Curate) on Jul 19, 2002 at 19:43 UTC

    Another option- Partially live. Use the excellent Cache::Cache to cache your output from HTML::Template, and then feed that to your users. Something like:

    my $cache=Cache::FileCache->new(); #options here. . . my $cache_key=$id; #Cache key that uniquely identifies this page my $object=$cache->get($cache_key); my $output; if( not defined $object){ # The page isn't in the cache, or it's expired. # Create your page, piping it through HTML::Template. $cache->set($cache_key,$object,'2 hours'); } else { # The page is in the cache. $output=$object; } print $object;

    The cool thing about this is you can delete the page directly from the cache when it's updated, so you can set your expire times pretty long. In this example I'm caching the entire page, but you can (obviously) only cache the very expensive parts of your application, or NOT cache the parts you want to change on each instance.

    I like Cache::Cache a LOT, but you really need to benchmark to make sure you're saving time.

    -Any sufficiently advanced technology is
    indistinguishable from doubletalk.

Re: Database driven web content: live or tape?
by gav^ (Curate) on Jul 20, 2002 at 02:29 UTC
    I work on Yahoo Store on a nearly daily basis, and the way they do things works fairly well. You have a set of templates which you can apply to items in the database. On the edit side you can click through the pages, and they are generated dynamically so it's possible to edit them and get instant feedback. You can also not worry about testing things out because only you can see them.

    When you are ready to publish, it runs anything that has changed (I'm pretty sure every row in the database has a flag to let the system know if it has been modified) through the templates to generate static html. This reduces server load and tends to be more more search engine friendly.

    This model seems to work well, and I'm sure implementing something along these lines would give you the best of both worlds.

    gav^

Re: Database driven web content: live or tape?
by FoxtrotUniform (Prior) on Jul 19, 2002 at 20:18 UTC

    I'd imagine that generating static pages would also reduce load on the web server, make the site faster, and so on. The trick would be regenerating the pages every time the DB changes (in a significant way...), rather than once a night via cron job.

    If the site's very dynamic, live updates would be the way to go, but otherwise, I'd be afraid that they'd be a waste of resources.

    Update: There's some good discussion on high traffic dynamic+cached sites at Opening too many files?.

    --
    The hell with paco, vote for Erudil!
    :wq

      I think the solution that's presenting itself now (after more coffee and some thought) is one where I go to the cache for each page .. if there's no hit, I go to the database.

      Whenever the database gets updated, I flush the cache.

      So that's the best of both worlds .. database driven, but cached until the database gets updated.

      Thanks to all for their thoughtful replies!!

      --t. alex

      "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!"
      --Michael Flanders and Donald Swann

        If you mean invalidating the entire cache for each update, I'd strongly recommend you use an invalidation scheme that at least affects only parts of your cache, if restricting it to just the one affected page is not possible. Otherwise, esp if your site contains a lot of relatively seldomly visited and seldomly updated pages, you'll reap much less benefit from caching than possible.

        Makeshifts last the longest.

Re: Database driven web content: live or tape?
by perrin (Chancellor) on Jul 19, 2002 at 20:35 UTC
    There is a simple test: is it too slow? If it is, pre-publish. Otherwise, don't.

    By the way, if you're using a flexible web server like Apache you can make dynamic pages have any URL you like, including /my/static/page.html. Look at mod_rewrite or mod_perl for this.

Re: Database driven web content: live or tape?
by kvale (Monsignor) on Jul 19, 2002 at 19:38 UTC
    Given that you want to update pages immediately, live seems necessary. Another advantage of static pages is that they are easier to program; the disadvantage is that easy programming is boring :)

    -Mark

Re: Database driven web content: live or tape?
by mattr (Curate) on Jul 20, 2002 at 08:20 UTC
    I am at a similar point in the planning of a project too; having finished over a month of interviews and page specifications, am about to roll up the sleeves.

    I'm planning on using HTML::Template, CGI::Application, and DBI with MySQL and using MySQL's query cache. Cron will generate images and possibly pdfs or word documents at night or when modified, the main load being searches on a product catalog by up to 2000 distributors, and the templating.

    I am leaning toward mod_perl for performance, though because of very limited time to build it, may develop without mod_perl (but with strict) first.

    I'm wondering about how much speed is gained in maintaining a persistent connection with Apache::DBI, and with templates being cached, so I can make an better decision about using mod_perl. I know it's great, and have used it, but I don't have time to wrestle with CGI::Application and restart the server a hundred times. I'm looking at. Application Performance using DBI and mod_perl now.

    I also was just looking at The HTML::Template page and it mentions a 90% speed increase using Template caching in mod_perl, plus a way to save a ton of memory by sharing the template cache between the processes.

      This is an excellent solution, IMHO. I'm implementing the very same design for my DNS submission site, and the performance difference between mod_perl and non-mod_perl is breathtaking. The main (minimal) differences you'll find with coding for mod_perl is the care you must take with your scoping. It was this project, in fact, that helped me to finally understand lexicals.

      Abstracting the HTML via HTML::Template was a wonderful choice as well... our non-perl coders will be able to update/modify the template pages to fit our company's overall site theme. Note, however, that HTML::Template shouldn't be called directly... CGI::Application has inherited the necessary methods from HTML::Template.

      With respect to Apache::DBI, I don't know that it's necessary to restart your server each time you make modifications to your code. Yes, it's recommended (but is it necessary?), but I've been able to continue coding and testing without being *required* to restart. The main difference... you'll notice some namespace(?) errors in your error_log until the next time you restart. As far as performance, I can't really help you there... it's my understanding that when using mod_perl, Apache will automagically use Apache::DBI if you're using the MySQL drivers (although I might be talking out of my ass).

      I'll probably play with Benchmark::Timer this week to track the difference between my mod_perl and non-mod_perl scripts. Mind you, there's no difference in the code, just the interpreter.

      -fp
      I am drooling at the thought of using mod_perl but unfortunately I don't have that option (I am hosted on pair Networks and they don't offer mod_perl). I expect to setup and teardown a DBI connection with each uncached page hit, so after a page has been hit once, we just go to the cache for it after that.

      I may well set the expiry date to half an hour in the future so that the cache doesn't get too stale.

      --t. alex

      "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!"
      --Michael Flanders and Donald Swann

        I wonder if it would be possible to simulate the persistent db handle of mod_perl with an IPC connection to a separate cgi process. Supposedly could renew itself every few minutes if the provider limits processing time.
Re: Database driven web content: live or tape?
by shotgunefx (Parson) on Jul 20, 2002 at 00:31 UTC
    Well it all depends on what type of usage you are expecting. If the pages are viewer agnostic, (No: Hi Lee! Welcome Back.) I would go static and generate the page when they actually update the record. If they are doing this from an app you are making, it should be trivial to catch changes.

    If there's no real process (just using mysqladmin or something) You could run a query every couple of minutes and compare the timestamps to a "published" timestamp to see if it needs updating. Much simpler and less fragile. The less parts there are, the less parts there are to break IMHO. If you need real time inventory or similar embedded in the pages and they change constantly, then live is certainly the way to go.

    As far as linkage, I would just make the id (page name) always the $product_code=~s/\W/-//g; $product_code =lc( $product_code . '.html'); Then there isn't any confusion on how to link to something. You could even "borrow" the bracketed linkage from perlmonks for internal links in the raw data and it would be simple to check the validity of local links when updating.

    Personally, we use static pages for 95% of our work.

    -Lee

    "To be civilized is to deny one's nature."
Re: Database driven web content: live or tape?
by tmiklas (Hermit) on Jul 20, 2002 at 04:47 UTC
    Hi talexb...

    As you said those solutions have advantages and disadvantages... You won't find the 'one and only' right way to do it... I use both of them becouse my key variable is frequency of changes in a database. If it's a news board or something like this, then it's generated live. If I can add a new data for an update daily, then it's static and linking directly to the URLs doesn't make any sense (URL is still valid, but the content is different).
    If you don't know which one to choose then try making it live, becouse then (if it's done well) you can 'generate' static HTML without real generator (using wget or something lik this).
    Choosing 'live' solution you won't get into the dark alley... besides (with the power of Perl) it's only a few keystrokes to make it statically generated using cron... Good luck!

    Greetz, Tom.
Re: Database driven web content: live or tape?
by inblosam (Monk) on Jul 20, 2002 at 07:31 UTC
    talexb: I don't know if you have heard of it but there is a solution/platform written in perl that has done all the hard stuff for you, but still gives you the flexibility of using perl throughout your website. It is called "in1" and you can find more info at inshift.com. I have used this for several of my big projects and it has turned out to be a real benefit. It not only stores your pages dynamically, but you can set pages to be published automatically based on any changes within tables or records in a database (so instead of cron it is based on when a static page becomes old because of new or modified data). User management, security, templates, etc. are all given to you as tools to implement into projects. Very cool stuff, and it is cheap. No proprietary database either (use mysql, mssql, or access). It has cut development time by 3 times for me!

    Michael Jensen
Re: Database driven web content: live or tape?
by Anonymous Monk on Jul 20, 2002 at 19:49 UTC

    i did more or less the same thing once.

    i did my own templating system, because HTML::Template didn't do all i wanted, but in retrospect i wish i'd changed my wants instead.

    we ran many web servers (mod_perl of course) and one oracle DB server (sun) which, once we finally got oracle optimized, was rockin' fast with pretty high traffic.

    anyway... if i were you, based just on what you said, i would do it live. AND, ya know what? i would think very carefully about the site logic beforehand, and then have URL's like "http://www.somestorethingy.com/products/widgets.html" or whatever -- and let apache rewrite them into things like /products.cgi?page=widgets or what have you. assuming you want "nabbable" URLs that's the smartest way to go.

    the *only* time in my project when live sucked was when i was asked to make a static version of the whole shebang so the database could be retired (company merger).

    anyway, that's just one report from the trench(es). good luck with your project.

    -- frosty
    MEDIENKUNST.COM

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://183394]
Approved by FoxtrotUniform
Front-paged by shotgunefx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-04-18 06:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found