Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Fetchrow question

by alainb (Initiate)
on Mar 21, 2012 at 12:17 UTC ( [id://960768]=note: print w/replies, xml ) Need Help??


in reply to Re: Fetchrow question
in thread Fetchrow question

Hi in fact the mais reason for this question is because we are experiencing performance issue, not only in speed but also the sql database who goes at 90-100% frequently and for a long time.

We searched on the web and there where people saying using Win32::ole will improve performance.

So I did a test on a test database on another server

The database is Microsoft SQL

The scripts start by using dbi:ODBC
here what it does

select var1,var2,var3 from table
then
loop until var1 is empty
($var1,$var2,$var3) = $Sth1->fetchrow();

For 100000 records it took less than 2 seconds

I did the same test but using Win32::OLE->new('ADODB.Connection');

I used the same select
select var1,var2,var3 from table
then i did the same loop and used
$var1 = $RS->Fields(0)->value;
$var2 = $RS->Fields(1)->value;
$var3 = $RS->Fields(2)->value;
$RS->MoveNext;

for the same 100000 records it took 16 seconds this time

So I suspected that something was related to having too much memory usage so I did in the same script

test1 using dbi:ODBC:test
test2 using Win32::OLE->new('ADODB.Connection');
test3 using dbi:ODBC:test

and the result was identical both test1 and test3 took
less than 2 sec and test2 still took 16 secs

So I wondered if everything is in memory why so many difference.

Replies are listed 'Best First'.
Re^3: Fetchrow question
by roboticus (Chancellor) on Mar 21, 2012 at 13:40 UTC

    alainb:

    I generally don't find much difference in speed between using the ODBC drivers and the native Oracle and MS SQL drivers, so I generally use DBD::ODBC.

    What strikes me when reading your node is that you're reading some 100,000 records. Is that just for a performance test, or are you typically doing such large reads? I'll read entire tables when I'm creating data dumps for third party vendors, but generally my queries only send/retrieve a couple hundred rows at most. If you're frequently pulling multiple thousands of records from the database, you might be doing too much work in perl and not enough in SQL. Doing work in SQL can be a little tricky, but you can save quite a good bit of I/O that way, and possibly solve any performance bottlenecks.

    Update: If you give a bit more detail about the task you're performing, and the types of queries you're doing, we might be able to give more specific advice.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re^3: Fetchrow question
by Corion (Patriarch) on Mar 21, 2012 at 12:27 UTC

    I would expect manually iterating through a Win32::OLE object to be slower than having a C library do it. Each method call and each property access to a Win32::OLE object has to cross process boundaries.

    If you can use ADODB directly, maybe it is faster than ODBC, but I expect database queries to be mainly limited by the speed of the network and the database, not by the local database driver, at least if it has a way of bulk fetching result rows. Maybe the GetRows() method of the ADODB recordset helps you a bit, but I would do this over native DBI instead of trying ADODB.

Re^3: Fetchrow question
by Anonymous Monk on Mar 21, 2012 at 13:43 UTC

    While I cannot comment on either ODBC or ADODB, I heartily recommend the Advanced Perl DBI talk, which discusses squeezing the most performance out on the Perl side, and a lot more.

    (Hint: fetchrow_array() copies the values practically for naught. You are better off using fetchrow_arrayref(), especially with the magic added by bind_columns() -- see slides 22-23)

      Hi yes the 100000 read was only to do the test and to be able to evaluate the difference in time from one method to another.

      The real thing dosent read so much record but its so creazy that we have to do a lot of work.

      First the is a keyid who is keep in a table
      There also is a table for the header and another table for the details.

      To add an header I have to read the keyid table, get the keyid then update the table with keyid +1.

      Then I can add the header with the new keyid.

      To add a detail I have to do the same:
      read the keyid table, get the keyid then update the table with keyid +1
      then add the detail with the new keyid.

      I know this probably look weird but its a package that we cannot modyfy so we have to work has they define their database.

      So Now to gain performance since I know Ho many details i have to add, instead of getting the keyid and updating it by 1 each times I update it once by number of detail line + 1 (header).

      Ie : if I have 10 details line + 1 header
      I Read the keyId and add 12 to it then update the table.

      I add the header with keyid-11
      I add the detail line1 with keyid-10
      ...
      I add the detail line10 with keyid

        alainb:

        It looks like you should be able to reduce the number of database calls. The method will likely differ based on the database, but you should be able to reduce the number of operations for each set, possibly to a single transaction. Examples:

        MySQL

        MySQL allows you to insert multiple records with a single statement, like so:

        insert into FooBar (ID, col1, col2, col3) values ((select max(ID)+1 from FooBar), 'a', 1, 'flort'), ((select max(ID)+2 from FooBar), 'b', 2, 'zap'), ((select max(ID)+3 from FooBar), 'c', 3, 'kazoo')

        This would let you use a single communication for each table type.

        MS SQL Server / Sybase / Oracle / etc.

        insert into FooBar (ID, col1, col2, col3) select ID, col1, col2, col3 from ( select max(ID)+1 from FooBar) ID, 'a' col1, 1 col2, 'flort' col3 union select max(ID)+1 from FooBar), 'b', 2, 'zap' union select max(ID)+1 from FooBar), 'c', 3, 'kazoo' )

        Oracle

        Some database drivers, such as Oracle, let you send a block of SQL to execute, so you can build blocks and do all your inserts with one swell foop:

        declare v_ID number; begin -- First table select max(ID) into v_ID from FooBar; insert into FooBar (ID, col1, col2, col3) values (v_ID+1, 'a', 1, 'flort'); insert into FooBar (ID, col1, col2, col3) values (v_ID+2, 'b', 2, 'zap'); insert into FooBar (ID, col1, col2, col3) values (v_ID+3, 'c', 3, 'kazoo'); -- Second table select max(ID) into v_ID from BarBaz; insert into BarBaz (ID, col1) values (v_ID+1, 'etcetera'); end;

        Using a work table

        Another option, that I frequently find useful, is to create some temporary work tables and bulk load the data into them. The bulk loader should be about the fastest way to get data into the database. Then create a stored procedure that integrates the data in the work tables into the real tables. For adding large sets of data to a database, this is pretty hard to beat.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://960768]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-20 01:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found