http://www.perlmonks.org?node_id=345463

tito has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
How can I improve performance when using oracle dbd? We have an urgent performance problem where inserting data into an Oracle 9i database using DBI/DBD oracle is at least five times slower than stright sql using sqlplus. Any suggestion is appreciated. Thanks,
Tim

Replies are listed 'Best First'.
Re: performance problem with oracle dbd
by dws (Chancellor) on Apr 15, 2004 at 17:35 UTC

    We have an urgent performance problem where inserting data into an Oracle 9i database using DBI/DBD oracle is at least five times slower than stright sql using sqlplus.

    Were this to drop in my lap, the first thing I'd do is make sure that I'm doing an "apples to apples" comparision. I.e., am I really issuing the same stream of requests to Oracle across the wire when using DBI that I am when using SQL*Plus.

    In the meantime, can you show us a representative chunk of code that is inserting data?

Re: performance problem with oracle dbd
by waswas-fng (Curate) on Apr 15, 2004 at 17:32 UTC
    Take a look at your inserts, are you using auto commits? if so every insert is casuing a commit, are you executing the same sql as the sqlplus is executing? Are the DBD settings similar to sqlplus's settings for buffer size and maximum window? Are you using placeholders?


    -Waswas
      Thanks for the reply, The performance problem is worse with the actual app. The five times performance slow down was verified by running two stripped down tests with identical insert statements and commit frequencies. One test was coded using PL/SQL looping construct and the other using perl. I will go and take a look at the documentation and see how one can set the buffer and windows sizes for DBD since I've not set those parameters for DBD.
        The PL/SQL code doesn't have any network overhead, so naturally it will be faster. PL/SQL is not a general-purpose language, so it can't do the same things Perl can, but if it meets your needs and speed is extremely important for this project, I'd say use it.
        PL/SQL is faster than DBD because it's all executed on the database side, rather than using multiple transactions. There is nothing wrong with throwing a chunk of PL/SQL into your Perl, if that's appropriate for what you want to do. It works fine.

        The PerlMonk tr/// Advocate
Re: performance problem with oralce dbd
by perrin (Chancellor) on Apr 15, 2004 at 17:56 UTC
    You need to show us the code. You are probably not doing things in the most efficient way, e.g. using prepare_cached. There is also a way to adjust how many rows get fetched at once, "RowCacheSize", which can make a big difference on large queries. I don't think it will affect inserts at all though.
      #! /use/bin/perl -w #
      # $Id: test.pl # # To test insert rate of Oracle DBD. # #********************************************************************* +******** use DBI; use strict; use diagnostics; if (@ARGV < 3) { print "\n"; print " test.pl <database name> <database account> <database accoun +t password>\n"; print "\n"; exit; } my $database_name = $ARGV[0]; my $db_user_name = $ARGV[1]; my $db_password = $ARGV[2]; my $data_source = "dbi:Oracle:$database_name"; if (!defined($db_password)) { $db_password = "password"; } if (!defined($db_user_name)) { $db_user_name = "userid"; } if (!defined($database_name)) { $database_name = "app"; } # # Connect to the database # $dbh = DBI->connect($data_source, $db_user_name, $db_password, { RaiseError => 1, AutoCommit => 1 , ora_session_m +ode => 0}) || die "Could not connect to database '$database_n +ame'\n"; $dbh->{RowCacheSize} = 100; my $rc; # return code my $func = $dbh->prepare(q{ INSERT INTO temp ( ACCT_STATUS_TYPE, NAS_USER_NAME, NAS_IDENTIFIER, NA +S_IP_ADDRESS, ACCT_SESSION_ID, FRAMED_IP_ADDRESS, SERVICE_TYPE, FRAMED_PROTOCOL, MED +IUM_TYPE, NAS_PORT_TYPE, NAS_PORT, NAS_PORT_ID, NAS_REAL_PORT, ACCT_DELAY_TIME, ACCT_SESSION_TI +ME, ACCT_TIME_STAMP, RATE_LIMIT_RATE, RATE_LIMIT_BURST, POLICE_RATE, POLICE_BURST, FILTER_I +D, FORWARD_POLICY, HTTP_REDIRECT_PROFILE_NAME, CONTEXT_NAME, SESSION_TIMEOUT, IDLE_TIMEOU +T, RB_DHCP_MAX_LEASES, MULTICAST_SEND, MULTICAST_RECEIVE, MULTICAST_MAX_GROUPS, IGMP_PROFILE_ +NAME, QOS_POLICING_POLICY_NAME, QOS_METERING_POLICY_NAME, QOS_PQ_POLICY_NAME +, ATM_PROFILE_NAME, PPPOE_URL, PPPOE_MOTM, PPPOE_IP_ROUTE, PPP_DNS_PRIMARY, PPP_DNS_SECONDARY, PPP_NB +NS_PRIMARY, PPP_NBNS_SECONDARY, ACCT_TERMINATE_CAUSE, SESSION_ERROR_MSG, ACCT_REAS +ON, ACCT_INPUT_OCTETS_64, ACCT_OUTPUT_OCTETS_64, ACCT_INPUT_PACKETS_64, ACCT_OUTPUT_PACKETS_64, +ACCT_MCAST_INPUT_OCTETS_64, ACCT_MCAST_OUTPUT_OCTETS_64, ACCT_MCAST_INPUT_PACKETS_64, ACCT_MCAST_OUTPUT_PACKETS_64 ) VALUES ( 'Start', 'joe', 'ser-1', '127.0.0.1', '0B00FFFF90000010-40647961', NUL +L, 'Outbound-User' , NULL, 'DSL', 'Virtual', 201392128, '12/1 clips 20013', NULL, NULL, N +ULL, 99999999 , NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'BASIC', NULL, NULL, 1, NU +LL, NULL, NULL , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU +LL, NULL, NULL , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL) },{ora_check_sql => 0 }); my $start_time = time; print "Start of insert test, time:", $start_time, "\n"; foreach (1..1500) { $func->execute(); } my $end_time = time; print "End of insert test, time:", $end_time, "\n"; $difference = $end_time - $start_time; print "Time taken: ", $difference, "\n"; # # to be absolutely sure that the export selection criteria and the pur +ge criteria sync up we # pick a time in the recent past by subtracting 6 hours (or 3600 seco +nds)away from the max timestamp # print "end \n"; $dbh->disconnect; 1;

        This is untested since I don't have an Oracle box to hand to play with.

        Were you doing a commit after every insert with the SQL you sent to sqlplus? If not I suspect that changing AutoCommit => 0 and doing an explicit commit every few hundred inserts will speed things up considerably.

        Oracle in particular does not do well with hard parses.

        Convert your query to use placeholders (ie pass in ? for each parameter, and then in the execute pass in the data) and performance should improve greatly. More importantly if load matters, hard parses use a type of internal lock known as a latch, and latch contention will cause Oracle to scale very poorly.

        Of course prepare_cached with placeholders will perform even better, but just switching to placeholders will make Oracle perform much better.

        To understand Oracle's performance characteristics, I can highly recommend virtually anything by Tom Kyte. He has strong opinions on how to use Oracle (some of which I disagree with) but does a very good job of explaining how things work and giving you the tools to explore further.

        UPDATE: As noticed by jplindstrom and etcshadow, I misread the code and the hard parses issue that I talked about is indeed a red herring. However if you are benchmarking, be aware that if the real code does not use placeholders, then hard parses will be a problem for the database, and you really want to avoid them.

        One thing I've noticed is the extreme amount of time that Oracle RAW data types take when DBD::Oracle thinks they are strings. You might consider binding your ACCT_SESSION_ID listed in the code if that value is a RAW using something like the following:

        $sth->bind_param(0, pack('H*', '0B00FFFF90000010-40647961'), { ora_type => ORA_RAW });

        This will save you lots of churning on the database side, as ORACLE will need to convert it to a raw value if you don't specify. We noticed a substantial performance difference when doing this with all of our RAW columns. Also as noted by others, { Autocommit => 0 } will be a big difference in speed.


        Hope that's helpful.
Re: performance problem with oralce dbd
by EdwardG (Vicar) on Apr 15, 2004 at 17:59 UTC

    Recently I had the need to insert 3.16 million rows into SQL Server (bear with me).

    I first tried using DBI, and quickly concluded that this was the WWTDI (Wrong Way To Do It).

    The right way turned out to be bcp.

    Apparently, the Oracle equivalent is SQLLoader.

      Good thought. A bulk insert for Oracle should be SQL*Loader, as this bypasses the rollback segmenets (less overhead ).

      Use of bind variables in sql statements can also be a big help, as sql statements can then be parsed (in Oracle) only once. See the DBD::Oracle doc for use of bind variables; some of my example code is below. Example code is for loading binary objects into a DB, but does use bind variables. Also, there's probably a more efficient way to write this, but it worked for me.

      # load modules use DBI; use DBD::Oracle qw(:ora_types); # connect to the db my $dbh = DBI->connect("dbi:Oracle:tlin","USERNAME","PASS") or die "Cant connect to db! $DBI::errstr\n"; $LONG_RAW_TYPE=ORA_LONGRAW; # oracle type id for long raw/blob ( could + also be 24) $bufsize = 400000; # putting this here, since we use it everywhere $dbh->{LongReadLen}=$bufsize; # dont set this too high; performance su +ffers $dbh->{LongTruncOk}=0; # find all the .xls in a dir and load them to db while ( defined(my $filename = glob("c:\\play\\tmp\\*"))){ open (BLOB, "<$filename") or die "Cant open $filename\: $!\n"; binmode(BLOB); my $bytes = 0; my $buf; $bytes = read(BLOB, $buf, $bufsize); if ( $bytes > $bufsize ){ print STDERR "File $filename too large to load. Skipping.\n"; close(BLOB); next; } close(BLOB); # fix the file name; one undef for each part of the path which is +not a file name. my ( undef, undef, undef, $storname ) = split /\\/, $filename; my $sth=$dbh->prepare(" insert into blobtest \( id\, fname\, fobj \) values \( blobid\.nextval\, \'$storname\'\, \:blob \) ") or die "Cant prep\: $DBI::errstr\n"; $sth->bind_param(":blob", $buf, {ora_type=>$LONG_RAW_TYPE}); $sth->execute or die "Cant exec\: $DBI::errstr\n"; print "File\: $filename loaded\n"; };

      More generally, take a look at overall Oracle parameters / tuning. Usually the defaults on DB creatation are insanely low and need adjusted much higher, but not so high that swapping / paging ensues.

      Thanks for the suggestion but the real app actually have a mix of different sql statments in it and after porting it from mysql (using DBI/DBD:: mysql) the performance dropped to an unaccpetable level so I created these simple test codes to see where the problem is. The select's are fine but commits for insert/updates are REALLY bad.
Re: performance problem with oracle dbd
by jeremyh (Beadle) on Apr 15, 2004 at 18:28 UTC
    Try AutoCommit => 0 in the $dbh = DBI->connect( ...

    Then you'll need to explicitly do a $dbh->commit; at least once after all the inserts ( or periodically, every so many inserts, etc.)

      Thanks for the suggestion but I'm mimicing what the real apps is doing and my pl/sql test code is doing exactly the same thing (commiting after every insert). Maybe what I can do is do a commit after the loop ends for both perl and sqlplus and see if the slow down is comparable...
        In that case, is the pl/sql code a stored procedure? If so, then that gives it a performance advantage over DBD which must be prepared at run-time. And the simple solution might be to just create a stored procedure that does your insert and then call it from perl:

        $sth_do_insert = $dbh->prepare( q{
        begin
        p_do_insert;
        end;
        } );

        $sth_do_insert->execute;

        if you need to pass the stored procedure some parameters, look at perldoc DBI and it should tell you how.

        ---------

        (Below are some thoughts if you can't do the stored procedure idea)

        Also, a stored procedure will run in the schema of it's owner, which may have a different database environment than your perl script DBD login (things like the default tablespace).

        Are you running the script on the machine that the DB is on? If not, sqlnet communication my slow things down quite a bit, although I wouldn't think an insert would be affected by that (I've seen a huge difference in query performance).

        Try to trace both sessions in oracle and see what the differences are:

        SQLPLUS> exec sys.dbms_system.set_sql_trace_in_session('&sid', '&serial', TRUE);

        where sid is the session ID and serial is the serial# (the & makes sqlplus prompt you for them, it is not part of the value); find these in v$session.

        run the same command with FALSE as the 3rd param to turn off the trace.

        Look for the trace files in your trace dump dest in init.ora

        Can you please show us the sqlplus code you are using...thanks.
Re: performance problem with oracle dbd
by pizza_milkshake (Monk) on Apr 15, 2004 at 23:43 UTC
    i do not have alot of experience with oracle, but i've been dealing with insert issues in other databases lately. here's some things you can try:
    • see if oracle comes with a tool that will import csv/tab-sep data. this is often many times faster that putting it into sql and inserting it.
    • check if oracle supports multi-insert syntax, where you can specify more than one records' worth of data per statement. this will greatly speed up your queries.
    • turn autocommit off. this is an easy one. you may also have to wrap the inserts in a transaction.
    • disable indexes/keys if oracle supports it. then enable them at the end of the script.

    perl -e'$_="nwdd\x7F^n\x7Flm{{llql0}qs\x14";s/./chr(ord$&^30)/ge;print'

Re: performance problem with oracle dbd
by etcshadow (Priest) on Apr 16, 2004 at 01:20 UTC
    From what info you've provided, the problem could be a network latency issue. If you are doing something like:
    for (1..1000) { $sth->execute(); }
    then you are communicating with the server, over the network, 1000 times. Each communication could be composed of several packets. Even if you assume just one packet round trip per communication (although it is possibly more), and a pretty good round-trip time of 10ms, then that's 10 seconds worth of network time.

    Compare that to doing the same with pl/sql: you only have one such communication with the server (to execute the pl/sql function, with the loop inside). Basically zero network time.

    Additionally, with the way that oracle handles inserts, internally, there is not necessarily a whole heck of a lot of overhead (assuming that the table is pre-extended, and particularly if you're not inserting data into indexed columns). Thus, network back-and-forth time could be a considerable overhead. However, a 5x difference is a bit much, unless the network connecting the DBI machine and the oracle server has some considerable latency. One thing to check is simply the ping time between them.

    ------------ :Wq Not an editor command: Wq
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: performance problem with oracle dbd
by Ryszard (Priest) on Apr 18, 2004 at 17:08 UTC
    I've done a bit of work with multi thousand row updates/inserts into oracle and have found the process of:
    1. spool out the data
    2. parse it with per
    3. sql load the data
    to be by far the best way to do it.

    If you're moving from mysql to oracle, i reckon this could be the best way.

    One tip (i found the hard way) is not wrapping up everything into a single click operation. Seperate all the tasks into descrete scripts, number them like 1.export_mysql.sh, 2.parse_output.pl, 3.add_to_oracle.sh.

    The advantage of unwrapping everything is if one part fails, it doesnt automatically roll on to the next one. No matter how good your checking is, there may always be one thing you've not taken into account.

    When i use the above method, i find my outages are very well organised, and in the event i'm sick, someone else can easily perform the same outage with very little ramp up time.

    Of course, most of this is OT, but may help none the less.