http://www.perlmonks.org?node_id=371707

kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I'm new to the idea of caching and I'm wondering how many ways there are to cache web pages.

I've a listing of forum topics that's generated dynamically. I understand that doing so can put considerable strain on the server especially during peak hours.

I'm thinking of saving the listing to a text file whenever a new post is made. Then, when the page with the listing is requested, the static text file that has been updated with the latest post is served, possibly via an INCLUDE_TMPL directive.

Is that caching? Is there a better way to do it?

Thanks in anticipation :)

Replies are listed 'Best First'.
Re: Ways of caching
by dws (Chancellor) on Jul 04, 2004 at 15:46 UTC

    Is that caching? Is there a better way to do it?

    Any time you store the result of a computation in a way that's more efficient to retrieve than it is to recompute, you're caching.

    Storing a list of stuff in a text file rather than retrieving it from a database can be an effective form of caching, with the caveat that you need to worry about concurrent access to the text file when updating it. Fortunately, flock is fairly well understood (though occasionaly misused).

      Thanks, dws!

      I was concerned about whether it's something that people do and about its effectiveness. With your advice, I feel reassured and will go ahead with its implementation.

      cheers

Re: Ways of caching
by dfaure (Chaplain) on Jul 04, 2004 at 15:50 UTC

    You're doing a here a form of caching: Pre-generating (long) a document/html page/... only when needed (data changes) and providing it (fast) each time requested, instead of making generation on each requests.

    Is there a better way to do it?

    The principle is good, then the real means depends on you code...

    ____
    HTH, Dominique
    My two favorites:
    If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
    Bien faire, et le faire savoir...

Re: Ways of caching
by tachyon (Chancellor) on Jul 05, 2004 at 06:11 UTC

    Some issues to reflect on:

    1. How do you determine when you can serve from cache and when you need to 'refresh'. For example say we were talking this node and you cached the node and all responses at time X. If there is a new reply how do you recognise that the cache entry is now invalid?
    2. How do you plan to expire from cache? If you don't clean out old stuff you will quickly build up lots of files.
    3. Related to the above. If you cache a lot of info you will find some ugliness with the filesystem. As you move towards 10,000 files per dir things start to grind to a halt (except maybe with Reiser FS, certainly with ext2/3). As a result you often need a heirarchy ie A/B/ABlog.dat

    You may find that squid fills the ticket very nicely. Look for http accelerator mode in the FAQ. In essence you set up squid on port 80 and your httpd on say port 81. The incoming requests all hit squid (which is the defacto standard caching server). If squid reckons it is cachable it serves it from its cache, otherwise it gets it from your web server. The beauty is that all the details are taken care of. As always YMMV and dynamic content is more suited to a customised approach.

    cheers

    tachyon

      Thanks, tachyon!

      I'm on a shared server so I guess I can't tweak the server to produce the desired result but I'll read the articles you suggested.

      I'm a little lost with Point #1. I'm writing only to one file. So when a post or reply is made, the information on the last 15 topics that's processed from the database is written to that file. When someone clicks on the the discussion board, that file is served via an INCLUDE_TMPL directive.

Re: Ways of caching
by tomhukins (Curate) on Jul 05, 2004 at 12:03 UTC

    The approaches you mention will help reduce server load by reducing the frequency with which you need to regenerate pages.

    When serving information over HTTP, you can reduce server load further by allowing Web caches to serve previous versions of unchanged content. It's possible to make pages cacheable, yet ensure that caches won't serve stale information by using HTTP's If-Modified-Since header field.

    See the Caching Tutorial for Web Authors and Webmasters for detailed information.

Re: Ways of caching
by relax99 (Monk) on Jul 06, 2004 at 12:59 UTC

    I've done something very similar in my calendar application and it seems to work really well. It's not a high-load application, so my primary motivation was to improve the response time by cutting off dynamic calendar view computations when there are no changes. I get about 50% improvement in response time when pages are viewed. The downside is that there is a slow down when pages are updated, but I figured the users of my application would be okay with a slight delay when they click "Save Changes" in exchange for faster loading pages when viewing. When users make updates to one of the pages I just replace the corresponding text file in cache. I do take concurrency issues into account, so I use flock() whenever a cache file is accessed for update. At the same time if the same file is requested from the cache while it's being written to, I detect that and return a dynamically generated page instead. Whenever more than one page is affected by a change, the application updates the page that is currently viewed first, returns that page to the user and then forks a separate process to finish updating other affected pages.

    I could show you some source code if you're interested.

    Alex
      Thanks for sharing and offering to show you code, Alex!

      I'm waiting to see how my site will respond when there're more people on the site at the same instant before making changes. If it's too slow, then I'll work on caching some of the pages.

      cheers