Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Node cache refactoring (DBI profiling)

by tye (Cardinal)
on Aug 13, 2003 at 03:49 UTC ( #283429=note: print w/ replies, xml ) Need Help??


in reply to Node cache refactoring

Here is some work-in-progress code for profiling Everything's use of the database. This will let me track what one page load from PerlMonks requires of the database in terms of a count of SQL statements, number of records read, and time spent in database requests.

I've currently tested it with loading some pages and found that a typical page load requires between 200 and 800 SQL statements at PerlMonks and 1 CPU second of web server time. The thread-that-must-not-be-named takes about 1500 SQL statements (a few more if not done anonymously) and 8 CPU seconds.

The major sections also read thousands or tens of thousands of records. I'll be fixing that as soon as I get the below code into production (and saving the stats to the database) so that I can quantify how much the fix improves things.

Note that this profiling code adds very little overhead on the web server (about 4%) and the code to record the stats into the database will only be updating a few records (per web server daemon process) every 5 minutes (on a staggered schedule) so the increase to the load on the database will be even smaller.

Note that more modern versions of DBI explicitly support subclassing so you probably don't have to play a few of the games I did (at least I assume it is just a version mismatch between module and docs that accounts for the documented method of subclassing not working -- the "games" were quite minor so I haven't bothered to investigate further).

And note that my short-cut for inserting code into a bunch of Everything functions is rather tricky because some of these functions are exported (and you have to apply this type of trick before the exporting is done or else the exported function doesn't end up "wrapped").

And note that I don't use SUPER:: at all. It makes assumptions that don't apply because of the way I give all of my methods (no matter what namespace they are in) access to the same utility functions w/o polluting the class namespaces.

                - tye
### Set up inheritance: ### use strict; ######################################## package Everything::DBI; ######################################## use vars qw( $VERSION ); BEGIN { $VERSION= 1.000; } use vars qw( @ISA ); BEGIN { push @ISA, 'DBI'; } ######################################## package Everything::DBI::db; ######################################## use vars qw( @ISA ); BEGIN { push @ISA, 'DBI::db'; } ######################################## package Everything::DBI::st; ######################################## use vars qw( @ISA ); BEGIN { push @ISA, 'DBI::st'; } ### The implementation: ### ######################################## package Everything::DBI::_impl; ######################################## use Time::HiRes qw( gettimeofday ); sub IntervalUS { my( $a0, $a1, $b0, $b1 )= @_; ( $b0, $b1 )= gettimeofday() unless defined($b1); return $b1 - $a1 + ( $b0 - $a0 )*1_000_000; } sub TotalUS() { "us" } sub OpCount() { "cnt" } sub OpDist() { "dist" } sub CalledFrom() { "from" } sub Everything::DBI::connect { my $pkg= shift @_; my $dbh= $pkg->DBI::connect( @_ ); return $dbh if ! $dbh; die "Everything::DBI: ref(DBI::connect(...)) eq '", ref($dbh), "' not 'DBI::db' !\n" unless 'DBI::db' eq ref($dbh); bless $dbh, 'Everything::DBI::db'; return $dbh; } # Wrap the common Everything methods used to query # so we can track the source of each query: for my $meth ( qw( Everything.pm Everything::getNode Everything::getNodeById Everything::getType Everything::selectLinks Everything::setVars Everything::updateHits Everything/HTML.pm Everything::HTML::genContainer Everything::HTML::htmlcode Everything/NodeBase.pm Everything::NodeBase::getNode Everything::NodeBase::getNodeById Everything::NodeBase::getNodeWhere Everything::NodeBase::getRef Everything::NodeBase::getType Everything::NodeBase::selectNodeWhere Everything::NodeBase::sqlSelect Everything::NodeBase::sqlSelectHashref Everything::NodeBase::sqlSelectMany Everything::NodeCache::new ) ) { if( $meth =~ /\.pm$/ ) { require $meth; next; } my $orig= \&$meth; my $new= sub { my $set= setCaller($meth); my @ret; if( wantarray ) { @ret= &$orig( @_ ); } elsif( ! defined wantarray ) { &$orig( @_ ); } else { $ret[0]= &$orig( @_ ); } clearCaller() if $set; return wantarray ? @ret : $ret[0]; }; no strict 'refs'; *{$meth}= $new; } BEGIN { open LOG, ">db.log" or die "Can't write to db.log: $!\n"; my $caller= ""; my %objs; sub setCaller { return if $caller; my( $meth )= shift @_; my( $pkg, $file, $line )= caller(1); my( $sub )= ( caller(2) )[3]; $sub ||= "n/a"; $caller= "$sub > $meth\n line $line of $file"; return 1; } sub clearCaller { $caller= ""; } sub logQuery { my( $sth, $us, $op, $sql, $res )= @_; my $stack= ""; #if( ! $caller ) { # my $depth= 3; # while( my $sub= (caller(++$depth))[3] ) { # $stack .= " < $sub"; # } #} if( ! $sth || ! ref($sth) ) { printf LOG "%.3fms %s%s;%s %s:\n\t%s\n", $us/1000, $op, $res, $stack, $caller, $sql; } elsif( "DESTROY" eq $op ) { my $obj= delete $objs{0+$sth}; my $h= $obj->{OpDist()}; my $ops= ""; for my $op ( keys %$h ) { $ops .= " $h->{$op}:$op"; } printf LOG "%.3fms %s; %s:\n\t%s\n", $obj->{TotalUS()}/1000,$ops, $obj->{CalledFrom()}, $sq +l; } else { my $obj= $objs{0+$sth}; if( ! $obj ) { $objs{0+$sth}= $obj= {}; $obj->{CalledFrom()}= $caller.$stack; } $obj->{TotalUS()} += $us; $obj->{OpCount()}++; $obj->{OpDist()}{$op}++; } } } ### Database methods wrapper: ### sub prepare { my $meth= shift @_; my $dbh= shift @_; my( $sql )= @_; my @t0= gettimeofday(); my $spec= "DBI::db::" . $meth; my $sth= $dbh->$spec( @_ ); my $us= IntervalUS( @t0 ); logQuery( $sth, $us, $meth, $sql ); return $sth if ! $sth || ! ref($sth); if( 'Everything::DBI::st' ne ref($sth) ) { die "ref(DBI::db::$meth(...)) eq ", ref($sth), " not DBI::st!" unless 'DBI::st' eq ref($sth); bless $sth, 'Everything::DBI::st'; } return $sth; } for my $meth ( qw( prepare prepare_cached do selectall_arrayref selectall_hashref selectcol_arrayref selectrow_array selectrow_arrayref selectrow_hashref ) ) { eval "sub Everything::DBI::db::$meth { prepare( '$meth', \@_ ); } + 1" or die "$@\n"; } ### Statement methods wrapper: ### sub execute { my $meth= shift @_; my $sth= shift @_; my @t0= gettimeofday(); my $spec= "DBI::st::" . $meth; my $res= $sth->$spec( @_ ); my $us= IntervalUS( @t0 ); my( $sql )= $sth->{Statement}; logQuery( $sth, $us, $meth, $sql, $res ); return $res; } for my $meth ( qw( execute execute_array fetchrow_array fetchrow_arrayref fetchrow_hashref fetchall_arrayref fetchall_hashref DESTROY ) ) { eval "sub Everything::DBI::st::$meth { execute( '$meth', \@_ ); } + 1" or die "$@\n"; } 1;


Comment on Re: Node cache refactoring (DBI profiling)
Download Code
Re: Re: Node cache refactoring (DBI profiling)
by perrin (Chancellor) on Aug 28, 2003 at 19:20 UTC
    Newer versions of DBI actually come with a profiler as part of the distribution (DBI::Profile). If PM moves to a more recent release at some point, you could use that. It has a pretty nice feature set.
Re: Re: Node cache refactoring (DBI profiling)
by theorbtwo (Prior) on Oct 07, 2003 at 06:08 UTC

    I just had an idea: We should have a nodelet (allowed for all -- that is, in sidebar nodelets or the allowed-to-all section of allowed nodelets) that gives the current web & db server load (probably cached), and, perhaps more importantly, how much db load this page caused. (Being a nodelet means that it will miss load caused by later nodelets, but I think that's better then putting it at the very bottom, where it's too easy to not notice.)

    The best way to do this would seem to be adding $NODE->{node_id} (the node id of the node being rendered) and $USER->{title} (the username of the logged-in user) (possibly also a unique id for the HTTP request, but I'm not sure how to generate that -- mod_perl, as opposed to CGI, thing, thing).

    Then the nodelet would have to sum up the information by parsing the logfile.

    Come to think of that, that last step is probably rather difficult. A better way may be for the loging methods to also add information to $HTMLVARS (a global hashref, which is reloaded with every request).

    In fact, we could also add logic to add daily totals to $VARS (user settings), which show how much total DB load you have caused today. That may be too much overhead for the logging methods, though -- they'd have to check if the day has changed every time. Also, that would mean yet more data in each fetch and store of the user, which is generaly a bad thing.

    If you want, I can start working up code.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Re: Node cache refactoring (DBI profiling)
by cleverett (Friar) on Nov 18, 2003 at 20:25 UTC
    quoth tye:

    I've currently tested it with loading some pages and found that a typical page load requires between 200 and 800 SQL statements at PerlMonks and 1 CPU second of web server time. The thread-that-must-not-be-named takes about 1500 SQL statements (a few more if not done anonymously) and 8 CPU seconds.

    On casual reflection, that seems grossly inefficient to me (not to take away from the original author(s) of this marvelous resource).

    Of course you plan to look at consolidating the queries.

Re^2: Node cache refactoring (DBI profiling)
by demerphq (Chancellor) on Nov 11, 2004 at 19:52 UTC

    theorbtwo just pointed this out to me. I'm wondering what would be involved to do some runs of it on the test server. Could you explain that part please?

    ---
    demerphq

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://283429]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-12-26 05:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (165 votes), past polls