Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Much slower DBI on RHEL6

by MPM (Novice)
on Feb 05, 2014 at 21:19 UTC ( #1073609=perlquestion: print w/ replies, xml ) Need Help??
MPM has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I've "inherited" a problem with an application. It makes use of perl-DBI and MySQL. On RHEL 4 and 5, the code executes at a sufficient speed. However, on RHEL6, it takes much longer. RHEL 4 & 5 < 10 sec; RHEL 6 > 45 seconds. In regards to hardware, the RHEL6 machine should be faster. Any help would be greatly appreciated. Here is part of the code... FYI, the MySQl db is located on the same machine so the connections are being made to localhost

my $dbh = $me->{DBHANDLE}; my %labels; # Get all the different ids my $sql = 'SELECT DISTINCT id FROM list_index '; my $sth = $dbh->prepare($sql); $sth->execute; my $prev; while( my $hashref = $sth->fetchrow_hashref) { #keep only the latest description #more than one description will exist, the hash will only keep one + key but it will be the oldest one # Get the last occurance's description my $sql2 = 'select id,description from list_index where id = "' +. $hashref->{id} . '" order by rpt_key desc limit 1'; my $sth2 = $dbh->prepare($sql2); $sth2->execute; while( my $hashref2 = $sth2->fetchrow_hashref) { $labels{$hashref2->{id}} = $hashref2->{id} . ' - ' . $hashref +2->{description}; } $sth2->finish; } $sth->finish; return \%labels;

Comment on Much slower DBI on RHEL6
Download Code
Re: Much slower DBI on RHEL6
by karlgoethebier (Curate) on Feb 05, 2014 at 21:46 UTC

    Really same my.cnf?

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Yes, they are identical.

        mySQL, DBI, DBD and Perl as well?

        Regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Re: Much slower DBI on RHEL6
by mbethke (Hermit) on Feb 05, 2014 at 22:38 UTC

    Try running the query with the mysql commandline client to see whether you get a significant runtime difference. That way you could isolate the problem to mysql if possible.

    In any case I don't see how the code could possibly work the way the comments indicate was intended. If 'id' is indeed a record ID as is customary in relational DBs, the DISTINCT does nothing. If not, it's indeed useful but then the results will be read one id at a time and nothing like "the hash will only keep one key but it will be the oldest one" happens. Then the same id is read again together with a single description. It looks like a

    SELECT DISTINCT id,description from list_index
    would do the same job. Or the whole script as
    my $sql = 'SELECT DISTINCT id,description FROM list_index'; foreach(@{ $dbh->selectall_arrayref($sql, {Slice => {}}) }) { $labels{$_->{id}} = "$_->{id} - $_->{description}"; }

    If that doesn't help, try and check the DDL, i.e. the CREATE TABLE, especially the indices on both tables.

      perhaps some sample data would make things clearer

      +---------+--------------------------------+--------------------+ | id | description | rpt_key | +---------+--------------------------------+--------------------+ | ABC | Organizational Detail Newer | ABC.2013_10_18 | | ABC | Organizational Detail Older | ABC.2012_10_15 | | XYZ | XYZ Status | XYZ.2012_08_05 | | XYZ | XYZ Status | XYZ.2012_08_12 | | XYZ | XYZ Status | XYZ.2013_08_14 | +---------+--------------------------------+--------------------+

      So the values I would want are:

      ABC - Organizational Detail Newer XYZ - XYZ Status

      Mmh, may be. But how this cohere with the RHEL 5 vs RHEL 6 issue?

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        sorry, not sure what you are asking here. I put sample db data and target result in response to show that the suggested query by mbethke wouldn't get the needed results... I think I was unclear on what the code should be doing

Re: Much slower DBI on RHEL6
by erix (Vicar) on Feb 05, 2014 at 23:22 UTC

    Are the database tables the same size?

    Do the database tables have the exact same data?

    Do the database tables have the same indexes?

    Perhaps one machine (wisely) chooses to scan the table because the search conditions are not specific enough for an index (which they were on the other machine).

    In short, there isn't really enough information...

    update: In general, it's a shame that database questions tend to talk of "meh, slow...", "fast!", "yippee, much faster!!" but omit to post timings and/or pertinent database plans (EXPLAIN PLAN variants). Also, or even especially, when the case is solved.

      It is the exact same data... same table structure, etc... the data was dumped on the RHEL 4 and imported onto the RHEL 6 machine.

      MySQL seems to return the data just as fast as it always has, it's the fetching done by perl-DBI that seems much slower on RHEL 6 ( perl 5.10 )

        Perl 5.10 IIRC was one of the less optimised Perl versions. I think some benchmarks put it 20-50% slower than 5.8. 5.12 and 5.14 improved the speed much. (Sorry, no sources to back this up at this time.)

        Hi,

        this thread is very interesting. How have you measured that mysql is gathering the informations at the same speed? Can you present the mysql version numbers on the different distributions? Have you checked whether the same query plan is used by mysql?

        Best regards
        McA

        P.S.: You use the same hardware for RHEL4 and RHEL6 and the same Linux config?

Re: Much slower DBI on RHEL6
by Anonymous Monk on Feb 06, 2014 at 10:15 UTC
    I don't know what's wrong with it -- you should try some place specialising in MySQL. It seems unlikely that Perl/DBI are at fault unless RHEL has done something wonky with the relevant packages.

    Anyway, you seem to be doing a death by a thousand queries there and that means _many_ round-trips. I'm sure the script can be rewritten to perform a single query. This is how I'd do it with Postgres-specific syntax:

    SELECT DISTINCT ON (id) id, description FROM list_index li ORDER BY id, rpt_key DESC;
    Writing it with a correlated subquery, we get a query that is terribly slow (one second on my couple-of-thousand-rows dataset):
    SELECT DISTINCT id, (SELECT description from list_index li_i WHERE li_i.id = li_o.id ORDER BY created_at DESC limit 1) AS description FROM list_index li_o;
    But moving the distinctness to a subquery makes it fast (~7 milliseconds):
    SELECT id, (SELECT description from list_index li_i WHERE li_i.id = li_o.id ORDER BY created_at DESC limit 1) AS description FROM (SELECT DISTINCT id FROM list_index) li_o;
    The SQL should work on every dialect, but of course, these speed measurements apply only to the PostgreSQL query planner. MySQL's is different. Try and see.

    ...My brains aren't working well enough right now to produce a way to do it without a dependent subquery.

      Bah. s/created_at/rpt_key/
      And now that my brains are working again, a third option that is quicker than all of those on Postgres, but YMMV:
      select li_i.id, description from list_index li_i join ( select id, max(rpt_key) as rpt_key from list_index group by id ) li_o on li_i.id = li_o.id and li_i.rpt_key = li_o.rpt_key;
      I assume here that the pair (id, rpt_key) is unique, but it should not hurt even if it isn't.

        Even though I have things working in a sufficient manner, I'll give this a shot as well. I agree MySQL should be able to be responsible for more of the processing. I'll report back with my findings. THanks.

Re: Much slower DBI on RHEL6
by McA (Curate) on Feb 06, 2014 at 11:23 UTC

    Hi,

    not an answer to your main question. But a kind of annotation looking at your code:

    IMHO you're not using the possibility of prepared statements and bind variables. Look at this code:

    my $dbh = $me->{DBHANDLE}; my %labels; # Get all the different ids my $sql = 'SELECT DISTINCT id FROM list_index '; my $sth = $dbh->prepare($sql); $sth->execute; my $sql2 = 'select id, description from list_index where id = ? order + by rpt_key desc limit 1'; my $sth2 = $dbh->prepare($sql2); while(my $hashref = $sth->fetchrow_hashref) { $sth2->execute($hashref->{id}); while(my $hashref2 = $sth2->fetchrow_hashref) { $labels{$hashref2->{id}} = $hashref2->{id} . ' - ' . $hashref +2->{description}; } } $sth2->finish; $sth->finish; return \%labels;

    With this code you gain two things: a) You don't prepare a new sql statement per loop iteration. b) You use bind variables which is almost always better than inserting values like you did it (sql injection). c) It should be measurable faster.

    Best regards
    McA

      Nice start, but fetching hashrefs is the slowest possible access method. Binding return values might improve a lot

      my $dbh = $me->{DBHANDLE}; my %labels; # Get all the different ids my $sth = $dbh->prepare ("select distinct id from list_index"); $sth->execute; $sth->bind_columns (\my $id); my $sth2 = $dbh->prepare (qq; select description from list_index where id = ? order by rpt_key desc; # -- I don't know if limit 1 is needed ); $sth2->execute (0); $sth2->bind_columns (\my $desc); while ($sth->fetch) { $sth2->execute ($id); while ($sth2->fetch) { $labels{$id} = "$id - $desc"; } } $_->finish for $sth, $sth2; return \%labels;

      Enjoy, Have FUN! H.Merijn

        Hi Tux,

        the focus in my proposal was on using bind variables and using a prepared statement more than once. But you're right, when someone is hunting the milliseconds, your approach should be faster.

        Thank you and ++.

        Best regards
        McA

        thank you EVERYBODY for the responses. I didn't mention it before but I had tried optimizing things by using different types of fetches, binding, etc. but nothing seemed to help that much. However, I do have things working much faster now( even faster than it was on RHEL 4)!. I don't know why, but here is what I found thanks to tux's post. I do need the "limit 1" because otherwise, I don't get the most recent description, I get the oldest. However, for whatever reason, when I use desc AND limit 1 things are slloooww. Without "limit 1" things are fast... So, it is much faster for me to iterate through all of the rows of ids just to get the "latest" than it is for me to sort in descending order and get just 1. Hope that makes sense. Thanks again everyone. My code is much more optimized and I'm sure throughout the application there are things can be optimized and more secure so I am going to work on that. So, in short, still don't know why the same exact Perl code, with the same data, is much slower on RHEL 6 but my issue is resolved. So this is the code I ended up using and it is faster and seems to give me the correct results

        my $dbh = $me->{DBHANDLE}; my %labels; # Get all the different ids my $sth = $dbh->prepare ("select distinct id from list_index"); $sth->execute; $sth->bind_columns (\my $id); my $sth2 = $dbh->prepare (qq; select description from list_index where id = ? order by rpt_key; ); $sth2->execute (0); $sth2->bind_columns (\my $desc); while ($sth->fetch) { $sth2->execute ($id); while ($sth2->fetch) { $labels{$id} = "$id - $desc"; } } $_->finish for $sth, $sth2; return \%labels;
Re: Much slower DBI on RHEL6
by Tux (Monsignor) on Feb 06, 2014 at 16:39 UTC

    Triggered by some other issues that came up in this thread, I reworked my speed test for access methods. The conclusion might be that when you run your database locally, as in no network delays for databases running on another host, the speed difference between the access methods is huge. fetchrow_hashref should not be used when speed really matters and the number of records visited is relatively big (for one record you won't notice a difference, as the overhead of setting up the handle will make the real fetch work disappear in noise).

    Assuming you already prepared your statement handle in $sth, DBI's access-methods are:

    HR while (my $ref = $sth->fetchrow_hashref) { # use $ref->{c_base} and $ref->{base} A while (my ($c_base, $base) = $sth->fetchrow_array) { # use $c_base and $base AR while (my $ref = $sth->fetchrow_arrayref) { # use $ref->[0] and $ref->[1] DAR while (my $ref = DBI::st::fetchrow_arrayref ($sth)) { # use $ref->[0] and $ref->[1] BC $sth->bind_columns (\my ($c_base, \$base)); while ($sth->fetch) { # use $c_base and $base DBC $sth->bind_columns (\my ($c_base, \$base)); while (DBI::st::fetchrow_arrayref ($sth)) { # use $c_base and $base

    When measured against relatively actual installations of the databases, the results are (relative speed, higher is better):

    perl-5.18.2-64int-ld perl-5.16.2-64all-ld DBI-1.631 DBI-1.631 DBD::Oracle-1.68 DBD::Oracle-1.68 Oracle 12.1.0.1.0 Oracle 11.2.0.3.0 -> 11.2.0.3.0 -> 11.2.0.3.0 HR 639 HR 387 DBC 804 A 1000 AR 913 AR 1142 BC 956 DBC 1186 DAR 967 DAR 1201 A 999 BC 1250 DBD::SQLite-1.40 DBD::SQLite-1.40 HR 255 HR 250 A 1000 A 1000 AR 1041 AR 1172 DAR 1166 DAR 1244 BC 1333 BC 1250 DBC 1455 DBC 1313 DBD::Pg-3.0.0 DBD::Pg-3.0.0 PostgreSQL 9.3.2 PostgreSQL 9.2.4 HR 198 HR 179 A 1000 A 1000 AR 1143 AR 1279 DAR 1186 DAR 1281 DBC 1326 BC 1417 BC 1332 DBC 1485 DBD::<span style="background:gold">CSV</span>-0.41 DBD::<spa +n style="background:gold">CSV</span>-0.41 HR 683 HR 791 A 1000 A 1000 AR 1164 BC 1153 DAR 1171 AR 1169 BC 1181 DAR 1176 DBC 1185 DBC 1186 DBD::mysql-4.025 DBD::mysql-4.026 MariaDB-5.5.33 MariaDB-5.5.33 HR 102 HR 95 A 999 A 1000 AR 1126 BC 1245 DAR 1271 AR 1438 BC 1287 DAR 1509 DBC 1328 DBC 1612 DBD::Firebird-1.16 firebird-2.5.2.26539 HR 565 DAR 991 AR 997 A 1000 DBC 1005 BC 1179

    As you can see, BC wins over HR by a factor way over 10 on MySQL. YMMV.


    Enjoy, Have FUN! H.Merijn
    code
      Does Perl's OO method call really produce that much of a slowdown? (BC vs DBC is 3,3 vs 3,9 in SQLite's case.) How does that happen?
      while ($sth->fetch) { # bound columns # most of the time a negligible difference # but this is sometimes faster? while (DBI::st::fetchrow_arrayref ($sth) { # 'DBC'

      Oh, and any chance of having DBD::Firebird tested, too?

        DBD::Firebird added to the list I posted before. I can't say I was positively surprised in installation and startup. OpenSUSE had the packages readily available, so installation went smooth. service started immediate, but then the shit hits the fan: I find the quick start guide way too Windows-minded and none of the commands is intuitive. Furthermore, the DBD installation is a hell: it does not find the needed libraries of header files in what I thought were pretty default locations. The firebird (and firebird-devel) packages installs the libfbclient.so.2 but no (symbolic) link to libfbclient.so. Creating of a new database has no command-line-tool. All and all I am not very charmed yet.


        Enjoy, Have FUN! H.Merijn
Re: Much slower DBI on RHEL6
by ruzam (Curate) on Feb 06, 2014 at 20:35 UTC

    Well, just to commiserate, not that long ago I migrated an application from a CentOS 5 server to a CentOS 6 server. The CentOS 6 server was faster hardware by every measure, and you could feel it.

    But my MySQL table data load times (insert statements through Perl DBI) nearly tripled. Spent too much time searching for a cause and solution without success. Despite the stupid slow load times, the application itself 'did' run faster overall for day to day use (or at least fast enough) and I left it at that. Probably an averaging out of code execution vs query execution

    There 'is' something wrong with the Perl/DBI/MySQL performance in CentOS 6. I continue with the assumption that somebody will eventually find the reason and a patch will fix it. In the mean time I accept that the table loads will be slower (a not too often initialization step) and the daily operation is fast enough.

      How are you handling commits? I think "autocommit" is turned on by default (commit after every execution of an insert or update), and that can be outrageously slow when slogging through a long list.

      I wrote a general-purpose command-line tool for doing inserts/updates using data from stdin or a file, and the best thing about it is the option that lets me set the commit interval. If I forget to use that option on an input of many thousands of rows, I really regret it, but when I set it to commit every 500 or 1000 rows, it flies, and I love it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1073609]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-09-01 12:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (6 votes), past polls