Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Background

This is rather long, but, I hope helpful.
As part of developing a (DBI/MySQL) database to store info about modules (more @ Module Dependencies), I needed to repeatedly check if a certain row already exists in the Database in order to avoid putting another identical row in.

Each row is a tuple showing a "this module uses that module" relationship.

I wanted to do some performance testing to find out the fastest way of discovering if there is a matching row already in the DB.

Aside: I can't find the equivalent of an efficient "INSERT ... IF NOT EXISTS" in MySQL.

The code below shows the different ways I found to get the result.

Conclusions

The fastest way to do it is :

my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module' AND uses='DBI'"; my $rows = $dbh->do($sql);
This may not seem surprising !
  1. using WHERE A='X' AND B='Y' is faster than WHERE A='X'.
  2. using COUNT(*) is very slow
  3. do() is more than twice as fast as prepare()/execute()
  4. Doesn't matter which column SELECT returns to do counting
  5. SELECT col1 is slightly faster than SELECT *
I found the first 3 of these surprising. In particular, I thought that do() was just a wrapper round prepare/execute, so the time difference is significant.

Other optimisations and comments

  1. Store IDs, rather than names in the DB. (Not tried - Would prob. be quicker, but at the expense of more lookup code and complication.)
  2. Cache results in PERL. (Not tried - Likely to be quicker, but less "pure" and some complexity in forward references.)
  3. Store "uses" relationship as a joined string in the module table and use PERLto manage. (Tried - was quicker, but again, not DB pure)
  4. Not tried on other DBD drivers
  5. The DB was small enough to fit in memory, and machine had no load, no no disk accesses during tests.
  6. I don't know if there are any internal optimisations happening (i.e. reusing cached results)
  7. I was less interested in the fact that some of the tests returned the 'wrong' result than the time taken, clearly in a full solution you need the correct answer :-)
  8. Not sure whether the do() is portable. The DBI code seems to suggest that some drivers return -1 untill all rows are fetched. Not sure if this would propagate back through the do() result.

Test Code

#! /usr/local/bin/perl -w #use strict; # because of the sub references use File::Find; use DBI(); use Depend::Module; use Date::Manip; use Benchmark; ### Table created with ## CREATE TABLE depend (\ ## usedby VARCHAR(80) NOT NULL ,\ ## uses VARCHAR(80) NOT NULL \ ## ) my $dbh = DBI->connect("DBI:mysql:database=modules;host=localhost", 'root', 'blah', { 'RaiseError' => 1 }); for ('Way1A','Way2A','Way2B','Way3A','Way4A','Way4B','Way5A','Way5B',' +Way6A') { my $ret = &$_; print "$_ returns [$ret]\n"; } my $res = timethese(10000, { 'Way1A'=>\&Way1A, 'Way2A'=>\&Way2A, 'Way3A'=>\&Way3A, 'Way4A'=>\&Way4A, 'Way4B'=>\&Way4B, 'Way5A'=>\&Way5A, 'Way6A'=>\&Way6A, 'Way2B'=>\&Way2B, 'Way5B'=>\&Way5B, }); Benchmark::cmpthese( $res) ; exit; ######## Subs sub Way1A { # Incorrect result, Only check one field # Doesn't count second field in this routine # uses prepare and execute my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module'"; my $sth = $dbh->prepare($sql); my $ret = $sth->execute; return $sth->rows; } sub Way2A { # Incorrect result, Only check one field # Doesn't count second field in this routine # uses do my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module'"; return $dbh->do($sql); } sub Way2B { # Incorrect result, Only check one field # Doesn't count second field in this routine # uses do # SELECT returns * == both columns my $sql = "SELECT * FROM depend WHERE usedby='Depend::Module'"; return $dbh->do($sql); } sub Way3A { # Generates correct result # uses prepare and execute my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module' AN +D uses='DBI'"; my $sth = $dbh->prepare($sql); my $ret = $sth->execute; return $sth->rows; } sub Way4A { # Generates correct result # uses do my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module' AN +D uses='DBI'"; my $ret = $dbh->do($sql); return $ret; } sub Way4B { # Generates correct result # uses do # counts other column my $sql = "SELECT usedby FROM depend WHERE usedby='Depend::Module' +AND uses='DBI'"; my $ret = $dbh->do($sql); return $ret; } sub Way5A { # Fetch each row manually and do our own count # Incorrect result, only checks first field my $count = 0; my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module'"; my $sth = $dbh->prepare($sql); my $ret = $sth->execute; while ($ret = $sth->fetch()) { $count++}; return $count; } sub Way5B { # Fetch each row manually and do our own count # Correct result my $count = 0; my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module' AN +D uses='DBI'"; my $sth = $dbh->prepare($sql); my $ret = $sth->execute; while ($ret = $sth->fetch()) { $count++}; return $count; } sub Way6A { # use COUNT(*) to do counting # Correct result my $sql = "SELECT COUNT(*) FROM depend WHERE usedby='Depend::Module +' AND uses='DBI'"; my $sth = $dbh->prepare($sql); my $rst = $sth->execute; my @count = $sth->fetchrow_array(); return $count[0]; } __DATA__ Results using 10,000 iterations Rate Way5A Way6A Way1A Way5B Way3A Way2B Way2A Way4B Way4A Way5A 1085/s -- -9% -10% -13% -13% -52% -53% -65% -65% Way6A 1190/s 10% -- -1% -5% -5% -48% -48% -61% -61% Way1A 1208/s 11% 1% -- -3% -4% -47% -48% -61% -61% Way5B 1250/s 15% 5% 4% -- -0% -45% -46% -59% -59% Way3A 1252/s 15% 5% 4% 0% -- -45% -46% -59% -59% Way2B 2268/s 109% 90% 88% 81% 81% -- -2% -26% -26% Way2A 2304/s 112% 94% 91% 84% 84% 2% -- -25% -25% Way4B 3077/s 184% 158% 155% 146% 146% 36% 34% -- 0% Way4A 3077/s 184% 158% 155% 146% 146% 36% 34% 0% --

--
Brovnik

In reply to How to count rows using DBI & mysql - Performance results by Brovnik

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (7)
    As of 2014-11-23 14:48 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My preferred Perl binaries come from:














      Results (132 votes), past polls