Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: DBI question with placeholders and IN

by VSarkiss (Monsignor)
on Sep 30, 2002 at 22:11 UTC ( [id://201869]=note: print w/replies, xml ) Need Help??


in reply to DBI question with placeholders and IN

Well, if you're using the prepared statement only once, it's not clear how much benefit you're getting from it. Nevertheless, if you want to do it that way, you need as many placeholders as values:

my $sql = qq{ SELECT COUNT(*) FROM some_table WHERE some_column IN (} . join(',', (('?') x @values)) . ")";

To me the "better" way would be to prepare the statement once and execute in a loop:

my $sql = <<END_SQL; SELECT COUNT(*) FROM some_table WHERE some_column = ? END_SQL my $sth = $dbh->prepare_cached($sql) || die; foreach my $value (@values) { $sth->execute($value); while ($sth->fetch) { # etc.
But it sounds like you've already thought of that.

HTH

Replies are listed 'Best First'.
Re: Re: DBI question with placeholders and IN
by diotalevi (Canon) on Sep 30, 2002 at 22:34 UTC

    I'll don my database hat here for a moment. I'm no expert on databases but you gain a lot by giving the database all the values at once since that query might result in a table scan. Obviously one scan is better than multiples - you're forcing multiples by putting the loop in perl. This is what things like PL/SQL are made for (except this doesn't even require that).

    I'd file your perl loop under Inadvisable Ways to do Database Programming. The multi-placeholder with an array in ->execute() is likely to get you the best performance since it gives the database engine everything it needs all at once.

    Update: Nevermind this advice. Apparently I have less clues than I thought.

      Your statements are so misleading I have to respond. Making absolute statements about performance without knowing the database schema and target platform is unwise, because you're just guessing without facts. But beyond that, even your generalizations are wrong.

      The point behind a prepared statement is that the optimizer can plan the query, so subsequent invocations can run much faster, without having to re-compile and re-plan it. In the best case, you have a point query (e.g., one where you're using a unique index) so that re-executing the prepared statement only involves searching the index tree again; if the query spans the index, there's a good chance the index will stay in the server's cache, and subsequent executions won't even go to the disk. An IN clause on most platforms is planned as a series of OR clauses, which, depending on the number of targets and the engine, may result in the optimizer just giving up and scanning the table instead of using an index.

      Obviously, one scan is better than multiples.
      That's not comparing good to bad, that's comparing bad to worse. In general, table scans, except for very small tables, are the slowest way to retrieve data from a database.

      I think what you're trying to say with your statement about "forcing multiples by putting the loop in perl" is that you should let the engine do as much server-side processing as possible. That's correct; the problem is when a host program retrieves a result set, then uses it as parameters to the next query. In such cases, yes, what you want to do is leave the result set in the database (a temporary table if you have to) rather than bring it to the client only for the purpose of sending it back to the server. However, I don't know the context of the original quey, nor where the values for the it are coming from.

      To file a prepare-execute loop under "Inadvisable Ways to do Database Programming", write gbarr DBI maintainers and tell them to change the DBI documentation. This is the standard way to retrieve rows with DBI.

      Finally, this statement

      likely to get you the best performance since it gives the database engine everything it needs all at once
      is just nonsense. There are so many other factors that are more important. Does the database support prepare_cached at all? Is there an index on that column? Is the table partitioned? Are you going across a WAN to reach the database? And so on.

      I'll don my database hat here for a moment. I'm no expert on databases...
      People turn to the Monastery for advice, and you could easily lead someone astray. Please don't guess when you know you could be guessing wrong.

      Update

      • Deleted incorrect Graham Barr reference.
      • rdfield and htoug bring up excellent points below, the main one being what I was trying to say: You can't make absolute statements about performance without knowing specifics.

        In general, table scans, except for very small tables, are the slowest way to retrieve data from a database.

        Not necessarily. Table scans are the slowest way to retrieve a single value (or at most a small percentage of all values) from a large table. Table scans are the most efficient way to retrieve a medium to large percentage of the rows of a large table, which in Oracle terms is defined to be 5 blocks (or whatever you've set the appropriate init.ora parameters to). Why is this the case? Because for every index lookup the database needs to make a number of IO calls depending on the number of levels in the B-Tree index (or whatever structure you've decided to base the index on) and then make another IO call to retrieve the table block that the appropriate row is in. On average that will be 4 IO calls to retrieve a single row (hence the 5 block default on large table in Oracle). However, when retrieving many rows via a table scan there are many rows per table block (and Oracle uses a more efficient algorithm to retrieve multiple blocks at a time) resulting in fewer IOs per row (but more rows read obviously).

        Management summary: table scans aren't necessarily a bad thing.

        rdfield

        I would avoid your prepare_cached-execute loop.

        The performance of a prepare with repeated executes compared to a repeated prepare-execute is highly driver, database engine and platform dependent. You should carefully test and bendhmark before choosing.

        The optimizer that fails over to a tabel scan for an IN clause should be regarded as underpowered. A usefull optimizer should be able to fiund out when it is beneficial to do a tabel sacn and when a repeated series of index/key lookups based on the query. If you areunfortunate enough to be hampered by an optimzer of this kind, then you have to carefully craft your queries to aid the optimizer (and yes, it might be useful to just fetch one value pr execute).

        This is one of the main reasons that is it impossible to state anything about performance of queries.

        One of the things that irks me most when reading about DBI in the monestary is that people assume that preparing a statement actually speeds things up. This is not always true. The Ingres optimizer does its work based on the values in the query, and can therefore not do its work before the values actually are given (in the execute call), so a prepare does not help very much here!

        Otoh I am given to understand that Oracle and others do some optimization without knowing the datavalues.

        In short: YMMV. But if you discover that the prepare execute loop is faster than the select with IN, then you should probably consider changing database vendor ;-) It should be faster to issue one large query rather than a series of small queries - give or take a bit.

        BTW: Don't bug gbarr about the DBI documentation. The author is Tim Bunce

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://201869]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-28 16:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found