http://www.perlmonks.org?node_id=960655

alainb has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have a question, and tryed to find the response without success.

When someone use DBI ODBC
if you do a select
$request = "select var1,var2,var3 from table";
$Dbh->prepare("$request");
$Sth->execute();
$var1,$var2,$var3) = $Sth1->fetchrow();

what exactly happen, I mean internaly?

Do the $Sth->execute() read the entire table and put it into an array in memory?

Do the $Sth->fetchrow() read one row from the array in memory?

OR it return some pointer than the fetchrow read one record from that table on disk at the time?

Replies are listed 'Best First'.
Re: Fetchrow question
by Marshall (Canon) on Mar 20, 2012 at 21:38 UTC
    I think that there are some implementation specific issues with this question.

    Summary:
    -Do the $Sth->execute() read the entire table and put it into an array in memory? - normally, yes

    -Do the $Sth->fetchrow() read one row from the array in memory? Yes, makes copy of what the DBI already has. The DBI's array is also normally completely memory resident, so this is a memory to memory copy - which albeit fast can take "significant time" when working with large data structures. No matter how fast something is, if you do enough million's of times, it adds up!

    =====
    The scalar value of $Sth->execute(); is number of results. Note that this can be the value "0E0", which is Perl's way of returning a "true, but zero number result" (meaning that the execute statement worked). In order to know the number of results, the DB has already produced the complete result set (or at least knows how many rows there are) and this would normally be memory resident.

    Update: I think the caveat here is "if it can do so". I think there are situations where the DB will have to "repeat work" or become less efficient if not enough memory is available to it. For example in SQLite, it is possible to dynamically increase the DB's cache size - this can help enormously when say indexing a large table. I am not sure that there is a "one size fits all" answer to your question. But for example, I think the default cache size for SQLite is 20MB. A couple of MB table is certainly memory resident after the execute.

    If you are interested in highest performance, use $sth->fetchall_arrayref(); This gives you a reference to the DBI's memory with the result set. You can modify a single row and send that single row back with an SQL update. This avoids you having to make a local copy of what the DBI already has in memory - and this can be significant if you are working with large result sets.

    fetchrow() gets you a copy of what the DBI already has and there are completely valid and fine reasons for doing it this way. Use more complicated mechanisms when performance matters.

    If I assign and keep the ref returned by fetchall_arrayref() in scope, I can make a new query and get a new array_ref - or at least that is how some of my SQLite DB codes works. I compare these result set(s) together in ways that the DB can't easily and I can modify say row 325 of $result_ref1 and execute another SQL command to update it in the DB without even having to make a copy of it outside of the DBI's memory.

    You are asking this question for some kind of reason. If you explain more about what you are doing, I'm sure some of the Monk DBI gurus will have some ideas about how to do it at least differently, if not better.

Re: Fetchrow question
by jandrew (Chaplain) on Mar 20, 2012 at 21:26 UTC

    Generally DBI has the database itself do as much of the heavy lifting (read holding data in memory) as possible.

    See the documentation on fetchrow_array and fetchall_arrayref for some additional details and distinctions.

Re: Fetchrow question
by JavaFan (Canon) on Mar 21, 2012 at 07:30 UTC
    You have questions of memory, but which memory are you talking about? First, there's the memory on the database server. Then there's the memory used on the client, which can be split in the memory used by the client libraries, and the memory used on the Perl side.

    As for select var1,var2,var3 from table, yes, that will cause the database server to load the entire table into memory (although if var1,var2 and var3 form a key, it might just load the index), but that is because you want all rows. It would be a bad database if it loaded the entire table if you have a where clause that causes only a fraction of the rows to be returned (and have the index to avoid a table scan).

    Do the $Sth->fetchrow() read one row from the array in memory? OR it return some pointer than the fetchrow read one record from that table on disk at the time?
    That isn't up to the DBI to decide. But that's up to your DBD (database driver) and the installed client libraries.
Re: Fetchrow question
by mje (Curate) on Mar 21, 2012 at 09:04 UTC

    I see others have answered your question but I'm not sure I agree with them all. When you say "Do the $Sth->execute() read the entire table and put it into an array in memory?" if you are talking about what happens in Perl or the XS code DBD::ODBC uses the answer is no. In ODBC terms what happens (in this simplified pseudo code is like this):

    you call $dbh->prepare: SQLPrepare(select whatever from whatever); you call $sth->execute SQLDescribeParam(called per parameter and parameters bound with SQLB +indParameter) - does not happen in your case as you haven't got any SQLExecute SQLNumResultCols(to find if there is a result-set and how many colum +ns) various calls to SQLDescribeCol and SQLColAttributes to find out abo +ut the columns you call $sth->fetchrow; If the columns are not already bound: SQLBindCol for each column SQLFetch (one row requested from ODBC driver) you call $sth->fetchrow; SQLFetch (one row requested from ODBC driver)

    What happens in the database and between the database and the ODBC driver is difficult to answer as it depends on the driver and database. For instance, with MS SQL Server using the TDS protocol then unless you've changed the default cursor, the database gathers all the rows and sends them down the socket and the driver reads whatever it decides (could be 1 row, could be multiple rows).

    When you use fetchall_* DBI calls fetch repeatedly in the DBD and stores the rows.

      Hi in fact the mais reason for this question is because we are experiencing performance issue, not only in speed but also the sql database who goes at 90-100% frequently and for a long time.

      We searched on the web and there where people saying using Win32::ole will improve performance.

      So I did a test on a test database on another server

      The database is Microsoft SQL

      The scripts start by using dbi:ODBC
      here what it does

      select var1,var2,var3 from table
      then
      loop until var1 is empty
      ($var1,$var2,$var3) = $Sth1->fetchrow();

      For 100000 records it took less than 2 seconds

      I did the same test but using Win32::OLE->new('ADODB.Connection');

      I used the same select
      select var1,var2,var3 from table
      then i did the same loop and used
      $var1 = $RS->Fields(0)->value;
      $var2 = $RS->Fields(1)->value;
      $var3 = $RS->Fields(2)->value;
      $RS->MoveNext;

      for the same 100000 records it took 16 seconds this time

      So I suspected that something was related to having too much memory usage so I did in the same script

      test1 using dbi:ODBC:test
      test2 using Win32::OLE->new('ADODB.Connection');
      test3 using dbi:ODBC:test

      and the result was identical both test1 and test3 took
      less than 2 sec and test2 still took 16 secs

      So I wondered if everything is in memory why so many difference.

        alainb:

        I generally don't find much difference in speed between using the ODBC drivers and the native Oracle and MS SQL drivers, so I generally use DBD::ODBC.

        What strikes me when reading your node is that you're reading some 100,000 records. Is that just for a performance test, or are you typically doing such large reads? I'll read entire tables when I'm creating data dumps for third party vendors, but generally my queries only send/retrieve a couple hundred rows at most. If you're frequently pulling multiple thousands of records from the database, you might be doing too much work in perl and not enough in SQL. Doing work in SQL can be a little tricky, but you can save quite a good bit of I/O that way, and possibly solve any performance bottlenecks.

        Update: If you give a bit more detail about the task you're performing, and the types of queries you're doing, we might be able to give more specific advice.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        I would expect manually iterating through a Win32::OLE object to be slower than having a C library do it. Each method call and each property access to a Win32::OLE object has to cross process boundaries.

        If you can use ADODB directly, maybe it is faster than ODBC, but I expect database queries to be mainly limited by the speed of the network and the database, not by the local database driver, at least if it has a way of bulk fetching result rows. Maybe the GetRows() method of the ADODB recordset helps you a bit, but I would do this over native DBI instead of trying ADODB.

        While I cannot comment on either ODBC or ADODB, I heartily recommend the Advanced Perl DBI talk, which discusses squeezing the most performance out on the Perl side, and a lot more.

        (Hint: fetchrow_array() copies the values practically for naught. You are better off using fetchrow_arrayref(), especially with the magic added by bind_columns() -- see slides 22-23)