Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Hardware to run heavy dbix::class stuff?

by uG (Scribe)
on Sep 19, 2011 at 16:16 UTC ( #926774=perlquestion: print w/replies, xml ) Need Help??
uG has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a script to analyze American football play by play data. The first step:
# Use this for calculating running stats my $all_current_plays = $game->plays->search({ GameIdentifier => $play +->GameIdentifier, id => { '<=', $play->id } })->search(undef, { cache + => 1 } ); my $all_current_home_plays = $all_current_plays->search({ b_home_posse +ssion => 1 })->search(undef, { cache => 1 }); my $all_current_away_plays = $all_current_plays->search({ b_home_posse +ssion => 0 })->search(undef, { cache => 1 });
This is pretty obvious, it grabs the resultset for all the plays (home, away, both). So far, so good.
if(my $b = $all_current_home_plays->search({ b_pass => 1 })) { my $count = $b->count; if($count >0) { #CALCULATIONS NEEDING A PASS PLAY TO HAVE OCCURRED my $completes = $count - $b->search({ b_incomplete => +1 })->count; my $sum = $b->get_column('yards')->sum; my $ints = $b->search({ b_intercepted => 1 })->count; if(my $c = $b->search({ down => 1 })) { my $acount = $c->count; if($acount > 0) { $stats{HTO_1st_YPP} = $b->search({ down => 1 } +)->get_column('yards')->sum / $acount; my $totals; while(my $d = $c->next) { $totals += $d->_get_pass_score; } $stats{HTO_1st_Pass_Score_Avg} = $totals / $ac +ount; } }
The above is an example of where my bottle necks start happening. For this section, we are analyzing all the plays (that are passing on 1st down) up to the current play. The while loop is the real killer though, as on play 200 it has to loop through the (up to) 200 previous plays and call _get_pass_score (which is just a $row->get_column('yards')->sum / $row->count Now I have multiple sections like the one above (rushing, passing, the various downs, special teams, etc). Overall a game runs about 13,000 queries, but its taking 300 seconds. I attribute this to dbix::class object creation (my perl process stays at 100% CPU while mysql sits at 5% or lower). I really don't want to generate a summary table or keep track of all this stuff in globals (I did this before, and it ran lightning fast but the code was too hard to maintain), so i'm wondering what kind of hardware (if any) will drastically improve my times (this is all on an atom processor and 4 gigs of ram, so CPU power is lacking). 300 seconds per

Replies are listed 'Best First'.
Re: Hardware to run heavy dbix::class stuff?
by stonecolddevin (Vicar) on Sep 19, 2011 at 16:44 UTC

    1. Use Devel::NYTProf to generate actual benchmarks for your code.
      • If you feel your SQL needs profiling, either copy and paste what is generated (you can see if you set the environmental variable DBIC_TRACE to 1) and run that against your DBMS, or use DBIx::Class::QueryProfiler
    2. Format your node text to be readable in a sane manner.

    I'm guessing you're not optimizing what you have right now currently.

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

Re: Hardware to run heavy dbix::class stuff?
by chrestomanci (Priest) on Sep 19, 2011 at 19:47 UTC

    I think you are taking the wrong approach. If you do thing right then most of the heavy lifting should take place inside your relational database engine (MySQL in this case).

    As you said, there is a large performance penalty when DBIx::Class creates objects from rows in the database. This is a known bottleneck with any Object relational mapper. It exists because by choosing to use DBIC, you have traded a little performance for convenience. It is hitting you hard because you have created lots of objects.

    What I think you should be doing is designing your database schema, and writing better queries so that DBIC only inflates the objects you really need. For example could you pre-calculate some stuff and store it as extra rows in your database so that calculations happen faster? Could you indexes or views to your database to further speed things up?

    Once you have succeeded in shifting the load to MySQL, then your problem will become one of optimising the performance of that database, which is not something we are expert in, though I suspect that by the time you reach that stage, you will have gained acceptable performance as you will have eliminated the DBIC object inflation penalty.

Re: Hardware to run heavy dbix::class stuff?
by CountZero (Bishop) on Sep 19, 2011 at 20:12 UTC
    That while-loop will indeed be the real killer, but you should be able to replace it by some aggregating SQL query.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://926774]
Front-paged by keszler
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2018-04-26 12:46 GMT
Find Nodes?
    Voting Booth?