Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE

by etcshadow (Priest)
on Nov 14, 2003 at 21:33 UTC ( #307219=note: print w/ replies, xml ) Need Help??


in reply to Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE

Yes, but it can drastically degrade performance (in very rare circumstances which I'll describe below).

Don't get me wrong, the default position should always be to use binds. Not only for the reasons that have been mentioned by others, but also because it can improve database performance in other, less direct ways. For example (speaking from experience with Oracle, in particular here, but I imagine that other RDBMSs have similar issues): even if you do not cache your statement handles, or use prepare_cached (and there are very good reasons not to do so, sadly), the database server, itself, caches various information based on the sql text. When it reencounters the same (textually) sql that it has seen before, it just looks up things like the parse-tree and the execution path, instead of recomputing them. Thus, if you bind your sql, and even if you throw away the statement handle... the database server doesn't, so not all of the prepare step is repeated. (Of course, some of the work is repeated, so it's still better to reuse your statement handles if you can.)

Anyway, I promised an example of drastically degraded performance with bound variables, and here it is: cost-based optimization. Say you've got a table ("thing") with a column called "type", and a million rows. Of those million rows, 50 are "type='special'" and all of the rest are "type='general'. You also have an index on thing.type. Now, run the query:

select * from thing where type=?
What is the proper execution plan? Should you use the index or not? Who knows? It depends on the value of the bind-parameter. If its value is 'general' then, hell no, you should not use the index. If its value is 'special', then hell yes, you should use the index. In fact, if it is anything other than 'general' you should. A CBO with histogram statistics would know the proper plan, if the value weren't bound.

Anyway, that's an overly simplistic example, but the thing is: this can and does happen. Just this week, I was optimizing a query that executed in about a minute with all of its values unbound, but took essentially an infinite amount of time when the values were bound. Same exact query.

Truth is, yes, start from the position of binding all of your literal values, but serious DB tuners can, and do, find situations where bind values can bite you in the ass.


------------
:Wq
Not an editor command: Wq


Comment on Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE
Download Code
Re: Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE
by etcshadow (Priest) on Nov 14, 2003 at 22:04 UTC
    Oh, and I forgot to mention, under the "other, less direct ways" that binds improve performance...

    Because of the fact that the RDBMS maintains a cache by sql-text... the more distinct sql that you pipe into your database, the bigger this cache has to be. (In oracle, you can investigate this cache by querying the v$sqlarea system view.) As this cache grows bigger, its hash buckets spill over, it consumes valuable shared pool memory, and various other Bad Things occur.

    Many people new to database development (and not using bound values), eventually hit a wall where their database seems to fall apart because of these problems. This is perhaps one of the least well understood causes of RDBMSs failing to scale.

    Anyway, just one more piece of fuel for the "use binds" fire.


    ------------
    :Wq
    Not an editor command: Wq
Re: Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE
by mpeppler (Vicar) on Nov 14, 2003 at 22:19 UTC
    There are usually ways to trick the RDBMS query optimizer into doing the "right" thing, even with border case queries like the one you show. I don't know about Oracle, but for Sybase the optimizer will use a generic weighting when it doesn't know the SARGS (search arguments) or when the SARGS are the result of functions or local variables (as is the case for bound variables via placeholders). If the index histogram is reasonable then the generated query plan is probably going to be quite reasonable - after all it's the same thing that happens when using stored procedures, and they work really well (with Sybase).

    Michael

    PS - we're drifting quite a long way from perl here...

      ...for Sybase the optimizer will use a generic weighting when it doesn't know the SARGS...

      Well, yes, and that's exactly the problem. Generic weighting simply tells you that you have a highly-unselective index, and, thus, the index will never be used.

      This is actually a common case with columns like "status" on task-tracking or ticket-tracking systems. If you've got a database with a million tasks, and nearly all of them are "COMPLETED", but the ones that you are typically interested in are "OPEN" (a small minority of the total population), then, bang, it's the exact issue I showed. You RDBMS will say "I don't know what value they want... but I know that the average value occupies half of the table. Accessing half of a table by index is a bad idea, so I'll full-table-scan."

      And, yeah, we are getting away from perl somewhat, but the original post was really about database performance tuning, so I don't feel so bad answering in kind. Also, there are a lot of "grey areas" around perl that folks on this site deal with a lot: CGI, HTML, shell/command-line, file-system, and databases. If perl is commonly used as a glue between these pieces... the line starts to blur a little bit with respect to what is fair game for discussion.


      ------------
      :Wq
      Not an editor command: Wq
        This is actually a common case with columns like "status" on task-tracking or ticket-tracking systems.
        I wanted to see how Sybase handles this case, so I did the following little test with the text_events table, which is used by Sybase to track what rows in source tables need to be updated in a Verity Full-Text index.
        select count(*), event_status from text_events group by event_status; event_status ----------- ------------ 26 0 2352103 2 81921 3
        Rows with event_status == 0 have to be processed, and there's an index on event_status.

        I first ran a SQL snippet like this:

        declare @status int select @status = 0 select * from text_events where status = @status
        And this did indeed do a table scan.

        Then I tried a short DBI script, using placeholders:

        my $sth = $dbh->prepare("select * from payment_db..text_events where e +vent_status = ?"); $sth->execute(1); fetch_all($sth); $sth->execute(0); fetch_all($sth);
        In this case the index is used every time. Then I wrote an ad-hoc stored procedure to do the same thing, and I get a table scan. Which just goes to show that this sort of thing is a bit of a black art :-)

        Michael

      In Oracle
      analyze table <table name> compute statistics for all indexed columns +size 254;
      with generate a histogram of weightings for each range of values that are in the indexed columns. Parse trees can be generated on a case by case basis. In the latest versions you can "fake" the stats to get the optimizer to behave properly.

      rdfield

        With Sybase you'd use the optdiag tool to get information about index statistics and index distribution, and to load simulated index statistics.

        To see query plans you use set showplan on, and dbcc traceflags 302 and 310 are used to get a full explanation of the choices made by the optimizer (very useful when the optimizer decides to do things that seem very counter-intuitive to you).

        Michael

        Yes, I understand how histogram statistics work... my point was that if you bind a value which the CBO would be interested in... it is only going to come up with one execution plan for the query, regardless of which value it is executed with.

        Or am I just misunderstanding what you are saying? Is there some hint I don't know of yet... something along the lines of: /*+ force_reparse_after_parameter_substitution */? If so, I'd love to hear it... because it would be really useful for me in a few situations. Situations where an application only allows for a query to be parameterized by bind values... and I want the query to be parametrizable, but not to suffer the hamstringing of the CBO that bind parameters are causing.


        ------------
        :Wq
        Not an editor command: Wq
Re: Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE
by runrig (Abbot) on Nov 14, 2003 at 22:37 UTC
    Since the query plan is done at prepare time, queries like this can cause a problem:
    select * from foo where bar = ? and baz like ?
    If there are two indexes, one on bar, and one on baz, which should be used? If baz is '%' (nothing but a wildcard), then it would be the wrong index to use, but if it's 'some long prefix%' then it's probably a good index to use, but there's no way to tell at prepare time. Trick: if you know baz is always the wrong index in this case but the database picks that index anyway (and your database does not support hints), you can fool it into not using the index (and hopefully using the correct one) by doing something like this (example also includes how to not use an index on a numeric column):
    select * from foo where bar = ? and baz || '' like ? and some_number + 0 = ?
      Yeah, there are lots of good examples... I was just trying to come up with the very simplest query that would demonstrate the problem.

      Honestly, single-table queries aren't likely to be all that problematic, no matter what. You do a full table-scan on a million-row table... well, that's 5 seconds you wish you hadn't spent, but it's not gonna totally kill you, unless it is being hit with great frequency. It's really when you get into large, complicated plans involving lots of joins and various other nasties, that tuning gets to be really critical (except, of course, for very frequently accessed queries).

      Somebody could write a whole book on database tuning... oh, wait... several people already have :-P


      ------------
      :Wq
      Not an editor command: Wq

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://307219]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (14)
As of 2014-07-22 18:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (123 votes), past polls