Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

same query, different execution, different performance

by punkish (Priest)
on Feb 13, 2012 at 22:57 UTC ( [id://953562]=perlquestion: print w/replies, xml ) Need Help??

punkish has asked for the wisdom of the Perl Monks concerning the following question:

I have a Postgres table with more than 8 million rows. Given the following two ways of doing the same query, I get wildly different results.

$q .= '%'; ## query 1 my $sql = qq{ SELECT a, b, c FROM t WHERE Lower( a ) LIKE '$q' }; my $sth1 = $dbh->prepare($sql); $sth1->execute(); ## query 2 my $sth2 = $dbh->prepare(qq{ SELECT a, b, c FROM t WHERE Lower( a ) LIKE ? }); $sth2->execute($q);

query 2 is at least an order of magnitude slower than query 1... seems like it is not using the indexes, while query 1 is using the index.

Would love hear why.



when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: same query, different execution, different performance (server-side--)
by tye (Sage) on Feb 14, 2012 at 01:30 UTC

    The one with a placeholder is "more efficient" in that it pre-plans the execution before knowing what value will be stuffed into the placeholder. Unfortunately, without that information, the resulting plan sucks. [By pre-planning at prepare() time, it doesn't have to waste time re-planning every time you execute() the same prepared query.]

    I have yet to run into a situation where I had and actually kept "server-side prepare" enabled, despite lots of people wanting to crow about how prepare + placeholders can be such a performance "win".

    Luckily, it is easy to disable server-side prepare so that you can still use place-holders while still getting "late" query planning every time you execute(). See pg_server_prepare since we are talking Postgresql.

    - tye        

Re: same query, different execution, different performance
by Marshall (Canon) on Feb 14, 2012 at 01:36 UTC
    The 2 queries are quite different. You don't mention what $q starts out being, but let's say $q='abc'; Then you add a wild card to it, $q="abc%";

    Prepare can be a very expensive critter and the DB might do a lot of thinking about what indicies, etc to use. In the first case, the DB knows that the string is going to start with "abc" at the time that the prepare method is called. That is a huge hint that it can take advantage of.

    In the second case, the DB doesn't know that in key bit of info advance, so it is a lot more "stupid" about how it goes about the "execute" part. So the basic difference is that "more thinking" went into the first prepare vs the second prepare because the DB knew more about what you were going to do.

      Thanks Marshall. Makes a lot of sense.


      when small people start casting long shadows, it is time to go to bed
        Another idea occurred to me - don't know if it would work or not - but an experiment should be easy for you...

        If you are always adding this % wildcard at the end, what happens if you put that wildcard into the version 2 prepare? i.e. LIKE ?% instead of LIKE ? In this case you would just put in the raw $q "abc" instead of "abc%". In my simple "untested code" brain, that might trigger the DB to use the index, knowing that it will start out the search with a constant term that you provide.

        I personally do not know the answer. But on the surface it sounds plausible. Anyway I think it would be fun the hear the results of that test. This is likely to be DB specific, but never-the-less interesting. I think the chance of success is low, but this critter may be smarter than we think.

        I am working on some complicated approximate matches in SQLite. This may not apply in your case, but consider that if case is not an issue: ABC% must all fall between x >="ABC" and x<"ABD" in a string comparison sense depending upon how the DB is indexed and other confounding factors...

      The 2 queries are quite different.

      No, they're identical

        Gee, at the moment I don't know how to explain it better - sorry if my English is confusing. Look again and see what is known when the prepare method is called. Read tye's post and look at the link which gives exactly the parameter needed to cause more to happen at execute() time rather than using the pre-prepared not so good default strategy - when preparing the statement, the DB just does a lot better if it knows "like something starting with abc" rather than "like something". The default is to use the info and strategy decided upon when the prepare statement was executed, but that behavior can be over-ridden.

Re: same query, different execution, different performance
by runrig (Abbot) on Feb 14, 2012 at 02:12 UTC

    Update: The OP is cross-posted at StackOverflow (Additional update: And on the DBI-users mailing list)

    The db can only use the index if the parameter does not begin with a wildcard. The db can not know at prepare time whether or not your parameter will start with a wildcard. So when it makes the query plan at prepare time, it doesn't know whether or not it can use the index.

    This is one of those times when it's best to not prepare the statement with bind parameters. But use $dbh->quote(...) on your parameter and just inline it into the SQL.

      This is one of those times when it's best to not prepare the statement with bind parameters. But use $dbh->quote(...) on your parameter and just inline it into the SQL.

      Don't even think about $dbh->quote(), use SUBSTR instead of LIKE whenever you need to test the start of a string against a LIKE-pattern.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Don't even think about $dbh->quote(), use SUBSTR instead of LIKE whenever you need to test the start of a string against a LIKE-pattern

        First, sometimes you should think about using quote(). Second, if you use SUBSTR(), again, the database won't use the index on the column, unless your database supports function based indexes (and I assume Postgres does and that there's an index on Lower(a)), and you have a function based index on the column, etc.

Re: same query, different execution, different performance
by InfiniteSilence (Curate) on Feb 14, 2012 at 01:04 UTC

    I recommend turning on full logging in PostgreSQL to see if there is a difference in the actual SQL being passed to the server and then run EXPLAIN on the statements if they are different.

    Celebrate Intellectual Diversity

      I have the exact same result being returned from both queries. Do I really need to worry that the queries might be different? I have asked this question on the DBI list. Perhaps they might know what is going on in the internals of DBI which is causing something to go awry with the second query.


      when small people start casting long shadows, it is time to go to bed

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://953562]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-03-19 07:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found