Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Opening too many files?

by Chady (Priest)
on Jan 28, 2002 at 23:17 UTC ( #142154=perlmeditation: print w/ replies, xml ) Need Help??

Fellow monks.

I was updating the way my website thinks, (and finally got it to run under strict and -w) and I got concerned about something:

    Here's how the page is servered:
  1. index.pl runs and sucks up strict, CGI, and HTML::TokeParser
  2. a list file is opened and parsed to get the files locations and the current writeups listed.
  3. the file is now known, it is opened and parsed
  4. An html template file is sucked in.
  5. another file for the forum is read to list the number of entries in the forum for the current writeup.
  6. finally, page is displayed.

is this too much opening files?

will I run myself into a wall someday for this?

Oh, and by the way, there's a CSS style sheet file that has actually a rewrite on the server to a perl file which reads a cookie and returns specific contents.

so let's say I use a similar technique on a high traffic site, will this overload the server with too much to do for a single page?


He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

Chady | http://chady.net/

Comment on Opening too many files?
Re: Opening too many files?
by clintp (Curate) on Jan 28, 2002 at 23:44 UTC
    That is a lot going on, and "high traffic" and "too much for a single page" are very subjective.

    One simple technique is to cache the pages (or parts of the pages): as you're assembling the page the first time write it to a file or some other storage then send it out. On subsequent fetches of the same page, check to see how old the cached version is and if it's significantly old enough (fudge, fudge) rebuild it otherwise just chuck out the text in the cache.

    You can keep this pretty lightweight if you remember to check the cache *before* you drag in all of the modules (incl. CGI, HTML::*, etc..). Then your overhead is simply the fork, exec of perl, and script compilation.

    This is just a first approximation at scaling. In fact, you don't really have to do much at all to make this happen to existing scripts if they're written with this in mind.

        One simple technique is to cache the pages (or parts of the pages): as you're assembling the page the first time write it to a file or some other storage then send it out. On subsequent fetches of the same page, check to see how old the cached version is and if it's significantly old enough (fudge, fudge) rebuild it otherwise just chuck out the text in the cache.

      Now, I know SFA about web caching, but wouldn't it make more sense to check the various components' last-modified times, and rebuild the page if any of them has changed (ala make). You'd only have to worry about "how long has it been since this page was fetched" if it's part of your cache replacement strategy.

      I was going to add something about this working best with a mostly static site, but I don't see how this would be worse than a timeout-based cache even for a fairly dynamic site. (Corrections are most welcome.)

      --
      :wq
        Not a correction, a clarification perhaps.

        The original poster mentioned "forums" and "writeups". In a message system with threads, replies, etc.. all going on at the same time it might be almost as much trouble to find out if an article has replies that are newer than a pretermined mark than it would be to fetch the articles themselves.

        Unless, of course, it were built that way originally. From the OP's tone I gather it wasn't.

        Whereas assembling a page and presenting a recent but not completely dynamic view of the data wouldn't be harmful in the case of a message board.

        Using timestamps to determine if the static view should be rebuilt (or even having a background task doing it) isn't a bad strategy either if you can determine what your "timestamp" is.

Re: Opening too many files?
by random (Monk) on Jan 31, 2002 at 04:31 UTC
    As other monks have suggested, the term "high traffic" is very subjective. However, I would say there is plenty of streamlining that you can do. Without knowing more about your site, it's my inclination to say that you can probably sub-divide your site, regenerating your more frequently visited pages regularly, but not dynamically (by a background process, for example). By the same token, those pages that haven't been accessed in a while can also be shelved, simply by applying whatever templates you may have and outputting static HTML. It's a delicate process...unfortunately, every webmaster really has to judge for themselves how much is too much, and that generally requires carefully examining traffic logs. In any case, good luck.

    -Mike-
Re: Opening too many files?
by Ryszard (Priest) on Feb 01, 2002 at 05:46 UTC
    one word: mod_perl.

    If you use mod_perl and dbi you should be sweet if you have enuff mem and tune your http server accordingly.</p.

    While I agree the terms 'heavy load' and 'too much traffic' are subjective or at least unqualified, you will be able to test and monitor your project to provide useful benchmarks to go by.

    In terms of dynamic content caching perhaps you could use a cookie to give you the last bit of content retrieved, then use the cookie value to determine if there is new content to get. If the content hasnt changed, send back the cached page, otherwise, replace the cached page and send it back.

    If your application has a high transaction rate, perhaps you could use a combination of timestamp/token to prevent you flushing the cache all the time (and making it useless).

    Perhaps some lessons could be gained by checking out how the chatterbox works. It is both dynamic, and has a relatively high transaction rate.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://142154]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (11)
As of 2014-07-28 18:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (206 votes), past polls