Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Node cache refactoring

by tye (Cardinal)
on Aug 01, 2003 at 06:30 UTC ( #279867=monkdiscuss: print w/ replies, xml ) Need Help??

It took quite a while but we finally worked out the details and I've received a grant to refactor the node cache.

I started working on it a week or two ago. I'm under deadline at work right now so I haven't gotten back to it but I wanted to get an announcement out.

The first part of the work will be adding some features to allow us to measure the effect of future changes. In looking at what we've already got, I found some traffic stats that vroom added earlier this year and added Yesterday's most-visited nodes as a quick way to access them.

I'll follow up with more details/plans next week. And I'll make a reply that I'll keep updated with the most recent changes/status...

Thanks to all who helped to make this happen.

                - tye

Comment on Node cache refactoring
Re: Node cache refactoring
by greenFox (Vicar) on Aug 01, 2003 at 07:01 UTC

    Look forward to seeing the results tye. ++ to you and all the monks that have allowed this to happen.

    As an aside whats so interesting about blue_cowdawg's, chunlou's and ybiC's user images that yesterday they got 736, 724 and 723 hits respectively????

    --
    Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

      Probably because someone posted links to them in one of their posts ;D

Re: Node cache refactoring (DBI profiling)
by tye (Cardinal) on Aug 13, 2003 at 03:49 UTC

    Here is some work-in-progress code for profiling Everything's use of the database. This will let me track what one page load from PerlMonks requires of the database in terms of a count of SQL statements, number of records read, and time spent in database requests.

    I've currently tested it with loading some pages and found that a typical page load requires between 200 and 800 SQL statements at PerlMonks and 1 CPU second of web server time. The thread-that-must-not-be-named takes about 1500 SQL statements (a few more if not done anonymously) and 8 CPU seconds.

    The major sections also read thousands or tens of thousands of records. I'll be fixing that as soon as I get the below code into production (and saving the stats to the database) so that I can quantify how much the fix improves things.

    Note that this profiling code adds very little overhead on the web server (about 4%) and the code to record the stats into the database will only be updating a few records (per web server daemon process) every 5 minutes (on a staggered schedule) so the increase to the load on the database will be even smaller.

    Note that more modern versions of DBI explicitly support subclassing so you probably don't have to play a few of the games I did (at least I assume it is just a version mismatch between module and docs that accounts for the documented method of subclassing not working -- the "games" were quite minor so I haven't bothered to investigate further).

    And note that my short-cut for inserting code into a bunch of Everything functions is rather tricky because some of these functions are exported (and you have to apply this type of trick before the exporting is done or else the exported function doesn't end up "wrapped").

    And note that I don't use SUPER:: at all. It makes assumptions that don't apply because of the way I give all of my methods (no matter what namespace they are in) access to the same utility functions w/o polluting the class namespaces.

                    - tye
      Newer versions of DBI actually come with a profiler as part of the distribution (DBI::Profile). If PM moves to a more recent release at some point, you could use that. It has a pretty nice feature set.

      I just had an idea: We should have a nodelet (allowed for all -- that is, in sidebar nodelets or the allowed-to-all section of allowed nodelets) that gives the current web & db server load (probably cached), and, perhaps more importantly, how much db load this page caused. (Being a nodelet means that it will miss load caused by later nodelets, but I think that's better then putting it at the very bottom, where it's too easy to not notice.)

      The best way to do this would seem to be adding $NODE->{node_id} (the node id of the node being rendered) and $USER->{title} (the username of the logged-in user) (possibly also a unique id for the HTTP request, but I'm not sure how to generate that -- mod_perl, as opposed to CGI, thing, thing).

      Then the nodelet would have to sum up the information by parsing the logfile.

      Come to think of that, that last step is probably rather difficult. A better way may be for the loging methods to also add information to $HTMLVARS (a global hashref, which is reloaded with every request).

      In fact, we could also add logic to add daily totals to $VARS (user settings), which show how much total DB load you have caused today. That may be too much overhead for the logging methods, though -- they'd have to check if the day has changed every time. Also, that would mean yet more data in each fetch and store of the user, which is generaly a bad thing.

      If you want, I can start working up code.


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      quoth tye:

      I've currently tested it with loading some pages and found that a typical page load requires between 200 and 800 SQL statements at PerlMonks and 1 CPU second of web server time. The thread-that-must-not-be-named takes about 1500 SQL statements (a few more if not done anonymously) and 8 CPU seconds.

      On casual reflection, that seems grossly inefficient to me (not to take away from the original author(s) of this marvelous resource).

      Of course you plan to look at consolidating the queries.

      theorbtwo just pointed this out to me. I'm wondering what would be involved to do some runs of it on the test server. Could you explain that part please?

      ---
      demerphq

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://279867]
Approved by sauoq
Front-paged by sauoq
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2014-11-28 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (193 votes), past polls