<?xml version="1.0" encoding="windows-1252"?>
<node id="87439" title="How to count rows using DBI &amp; mysql - Performance results" created="2001-06-11 10:33:04" updated="2005-08-11 06:32:25">
<type id="120">
perlmeditation</type>
<author id="80482">
Brovnik</author>
<data>
<field name="doctext">
&lt;b&gt;Background&lt;/b&gt;
&lt;p&gt;
This is rather long, but, I hope helpful.
&lt;br&gt;
As part of developing a (DBI/MySQL) database to store info
about modules (more @ [id://85848]), I needed to
repeatedly check if a certain row already exists in the Database
in order to avoid putting another identical row in.
&lt;p&gt;Each row is a tuple showing a "this module uses that module" relationship.
&lt;p&gt;I wanted to do some performance testing to find out the fastest way of discovering if there is a matching
row already in the DB.
&lt;p&gt;Aside: I can't find the equivalent of an efficient "INSERT ... IF NOT EXISTS" in MySQL.
&lt;p&gt;The code below shows the different ways I found to get the result.
&lt;p&gt;
&lt;b&gt;Conclusions&lt;/b&gt;
&lt;p&gt;
The fastest way to do it is :
&lt;code&gt;
   my $sql = "SELECT uses FROM depend WHERE 
      usedby='Depend::Module' AND
      uses='DBI'";
   my $rows = $dbh-&gt;do($sql);
&lt;/code&gt;

&lt;readmore&gt;
This may not seem surprising !
&lt;ol&gt;
&lt;li&gt;using &lt;code&gt;WHERE A='X' AND B='Y'&lt;/code&gt; is faster than &lt;code&gt;WHERE A='X'&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;using COUNT(*) is very slow&lt;/li&gt;
&lt;li&gt;do() is more than twice as fast as prepare()/execute()&lt;/li&gt;
&lt;li&gt;Doesn't matter which column SELECT returns to do counting&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SELECT col1&lt;/code&gt; is slightly faster than &lt;code&gt;SELECT *&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
I found the first 3 of these surprising. In particular, I thought that
do() was just a wrapper round prepare/execute, so the time difference is
significant.
&lt;p&gt;
&lt;b&gt;Other optimisations and comments&lt;/b&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Store IDs, rather than names in the DB. (Not tried - Would prob. be quicker, but at the expense of more lookup code and complication.)&lt;/li&gt;
&lt;li&gt;Cache results in PERL. (Not tried - Likely to be quicker, but less "pure" and some complexity in forward references.)&lt;/li&gt;
&lt;li&gt;Store "uses" relationship as a joined string in the module table and use PERLto manage. (Tried - was quicker, but again, not DB pure)&lt;/li&gt;
&lt;li&gt;Not tried on other DBD drivers&lt;/li&gt;
&lt;li&gt;The DB was small enough to fit in memory, and machine had no load, no no disk accesses during tests.&lt;/li&gt;
&lt;li&gt;I don't know if there are any internal optimisations happening (i.e. reusing cached results)&lt;/li&gt;
&lt;li&gt;I was less interested in the fact that some of the tests returned the 'wrong' result than the time taken, clearly in a
full solution you need the correct answer :-)&lt;/li&gt;
&lt;li&gt;Not sure whether the do() is portable. The DBI code seems
to suggest that some drivers return -1 untill all rows are fetched. Not sure if
this would propagate back through the do() result.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
&lt;b&gt;Test Code&lt;/b&gt;
&lt;p&gt;
&lt;code&gt;
#! /usr/local/bin/perl -w

#use strict; # because of the sub references
use File::Find;
use DBI();
use Depend::Module;
use Date::Manip;
use Benchmark;

### Table created with
##  CREATE TABLE depend (\
##     usedby VARCHAR(80) NOT NULL ,\
##     uses   VARCHAR(80) NOT NULL \
##  )

my $dbh = DBI-&gt;connect("DBI:mysql:database=modules;host=localhost",
      'root', 'blah', { 'RaiseError' =&gt; 1 });

for ('Way1A','Way2A','Way2B','Way3A','Way4A','Way4B','Way5A','Way5B','Way6A')
{
   my $ret = &amp;$_;
   print "$_ returns [$ret]\n";
}

my $res = timethese(10000,
   {
   'Way1A'=&gt;\&amp;Way1A,
   'Way2A'=&gt;\&amp;Way2A,
   'Way3A'=&gt;\&amp;Way3A,
   'Way4A'=&gt;\&amp;Way4A,
   'Way4B'=&gt;\&amp;Way4B,
   'Way5A'=&gt;\&amp;Way5A,
   'Way6A'=&gt;\&amp;Way6A,
   'Way2B'=&gt;\&amp;Way2B,
   'Way5B'=&gt;\&amp;Way5B,
   });

Benchmark::cmpthese( $res) ;

exit;

######## Subs
sub Way1A
{
   # Incorrect result, Only check one field
   # Doesn't count second field in this routine
   # uses prepare and execute

   my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module'";
   my $sth = $dbh-&gt;prepare($sql);
   my $ret = $sth-&gt;execute;
   return $sth-&gt;rows;

}

sub Way2A
{
   # Incorrect result, Only check one field
   # Doesn't count second field in this routine
   # uses do

   my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module'";
   return $dbh-&gt;do($sql);
}

sub Way2B
{
   # Incorrect result, Only check one field
   # Doesn't count second field in this routine
   # uses do
   # SELECT returns * == both columns

   my $sql = "SELECT * FROM depend WHERE usedby='Depend::Module'";
   return $dbh-&gt;do($sql);
}

sub Way3A
{
   # Generates correct result
   # uses prepare and execute

   my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module' AND uses='DBI'";
   my $sth = $dbh-&gt;prepare($sql);
   my $ret = $sth-&gt;execute;
   return $sth-&gt;rows;
}
sub Way4A
{
   # Generates correct result
   # uses do

   my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module' AND uses='DBI'";
   my $ret = $dbh-&gt;do($sql);
   return $ret;
}

sub Way4B
{
   # Generates correct result
   # uses do
   # counts other column

   my $sql = "SELECT usedby FROM depend WHERE usedby='Depend::Module' AND uses='DBI'";
   my $ret = $dbh-&gt;do($sql);
   return $ret;
}

sub Way5A
{
   # Fetch each row manually and do our own count
   # Incorrect result, only checks first field

   my $count = 0;
   my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module'";
   my $sth = $dbh-&gt;prepare($sql);
   my $ret = $sth-&gt;execute;
   while ($ret = $sth-&gt;fetch()) { $count++};
   return $count;
}

sub Way5B
{
   # Fetch each row manually and do our own count
   # Correct result

   my $count = 0;
   my $sql = "SELECT uses FROM depend WHERE usedby='Depend::Module' AND uses='DBI'";
   my $sth = $dbh-&gt;prepare($sql);
   my $ret = $sth-&gt;execute;
   while ($ret = $sth-&gt;fetch()) { $count++};
   return $count;
}
sub Way6A
{
   # use COUNT(*) to do counting
   # Correct result

   my $sql = "SELECT COUNT(*) FROM depend WHERE usedby='Depend::Module' AND uses='DBI'";
   my $sth = $dbh-&gt;prepare($sql);
   my $rst = $sth-&gt;execute;
   my @count = $sth-&gt;fetchrow_array();
   return $count[0];
}

__DATA__

Results using 10,000 iterations

        Rate Way5A Way6A Way1A Way5B Way3A Way2B Way2A Way4B Way4A
Way5A 1085/s    --   -9%  -10%  -13%  -13%  -52%  -53%  -65%  -65%
Way6A 1190/s   10%    --   -1%   -5%   -5%  -48%  -48%  -61%  -61%
Way1A 1208/s   11%    1%    --   -3%   -4%  -47%  -48%  -61%  -61%
Way5B 1250/s   15%    5%    4%    --   -0%  -45%  -46%  -59%  -59%
Way3A 1252/s   15%    5%    4%    0%    --  -45%  -46%  -59%  -59%
Way2B 2268/s  109%   90%   88%   81%   81%    --   -2%  -26%  -26%
Way2A 2304/s  112%   94%   91%   84%   84%    2%    --  -25%  -25%
Way4B 3077/s  184%  158%  155%  146%  146%   36%   34%    --    0%
Way4A 3077/s  184%  158%  155%  146%  146%   36%   34%    0%    --
&lt;/code&gt;

&lt;br&gt;--&lt;br&gt;&lt;A HREF="/index.pl?node_id=80482&amp;lastnode_id=1072"&gt;Brovnik&lt;/A&gt;</field>
</data>
</node>
