Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

DBI question with placeholders and IN

by dragonchild (Archbishop)
on Sep 30, 2002 at 21:52 UTC ( [id://201863]=perlquestion: print w/replies, xml ) Need Help??

dragonchild has asked for the wisdom of the Perl Monks concerning the following question:

I'm using DBI with Oracle (don't know which version) and Perl 5.004 (not my decision) for some rather complex queries and I've found that I want to be able to do the following:
my $values = join ",", @values; # This is 1 to N values my $sql = <<END_SQL; SELECT COUNT(*) FROM some_table WHERE some_column IN (?) END_SQL my $sth = $dbh->prepare_cached($sql) || die; $sth->execute($values) || die; while ($sth->fetch) { # Do stuff } $sth->finish || die;
But, I cannot. The only solution I've found is to put $values directly into the prepare, which negates the prepare_cached() benefits. Does anyone have a better way?

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Replies are listed 'Best First'.
Re: DBI question with placeholders and IN
by VSarkiss (Monsignor) on Sep 30, 2002 at 22:11 UTC

    Well, if you're using the prepared statement only once, it's not clear how much benefit you're getting from it. Nevertheless, if you want to do it that way, you need as many placeholders as values:

    my $sql = qq{ SELECT COUNT(*) FROM some_table WHERE some_column IN (} . join(',', (('?') x @values)) . ")";

    To me the "better" way would be to prepare the statement once and execute in a loop:

    my $sql = <<END_SQL; SELECT COUNT(*) FROM some_table WHERE some_column = ? END_SQL my $sth = $dbh->prepare_cached($sql) || die; foreach my $value (@values) { $sth->execute($value); while ($sth->fetch) { # etc.
    But it sounds like you've already thought of that.

    HTH

      I'll don my database hat here for a moment. I'm no expert on databases but you gain a lot by giving the database all the values at once since that query might result in a table scan. Obviously one scan is better than multiples - you're forcing multiples by putting the loop in perl. This is what things like PL/SQL are made for (except this doesn't even require that).

      I'd file your perl loop under Inadvisable Ways to do Database Programming. The multi-placeholder with an array in ->execute() is likely to get you the best performance since it gives the database engine everything it needs all at once.

      Update: Nevermind this advice. Apparently I have less clues than I thought.

        Your statements are so misleading I have to respond. Making absolute statements about performance without knowing the database schema and target platform is unwise, because you're just guessing without facts. But beyond that, even your generalizations are wrong.

        The point behind a prepared statement is that the optimizer can plan the query, so subsequent invocations can run much faster, without having to re-compile and re-plan it. In the best case, you have a point query (e.g., one where you're using a unique index) so that re-executing the prepared statement only involves searching the index tree again; if the query spans the index, there's a good chance the index will stay in the server's cache, and subsequent executions won't even go to the disk. An IN clause on most platforms is planned as a series of OR clauses, which, depending on the number of targets and the engine, may result in the optimizer just giving up and scanning the table instead of using an index.

        Obviously, one scan is better than multiples.
        That's not comparing good to bad, that's comparing bad to worse. In general, table scans, except for very small tables, are the slowest way to retrieve data from a database.

        I think what you're trying to say with your statement about "forcing multiples by putting the loop in perl" is that you should let the engine do as much server-side processing as possible. That's correct; the problem is when a host program retrieves a result set, then uses it as parameters to the next query. In such cases, yes, what you want to do is leave the result set in the database (a temporary table if you have to) rather than bring it to the client only for the purpose of sending it back to the server. However, I don't know the context of the original quey, nor where the values for the it are coming from.

        To file a prepare-execute loop under "Inadvisable Ways to do Database Programming", write gbarr DBI maintainers and tell them to change the DBI documentation. This is the standard way to retrieve rows with DBI.

        Finally, this statement

        likely to get you the best performance since it gives the database engine everything it needs all at once
        is just nonsense. There are so many other factors that are more important. Does the database support prepare_cached at all? Is there an index on that column? Is the table partitioned? Are you going across a WAN to reach the database? And so on.

        I'll don my database hat here for a moment. I'm no expert on databases...
        People turn to the Monastery for advice, and you could easily lead someone astray. Please don't guess when you know you could be guessing wrong.

        Update

        • Deleted incorrect Graham Barr reference.
        • rdfield and htoug bring up excellent points below, the main one being what I was trying to say: You can't make absolute statements about performance without knowing specifics.

Re: DBI question with placeholders and IN
by Zaxo (Archbishop) on Sep 30, 2002 at 22:13 UTC

    You didn't say what errors you get, but from your code it looks like you glue several values together, and treat that as a single argument. That will lead the db to look for the whole string in each row. You need to have a placeholder for each value of argument to IN(), then feed your array as-is to the execute method. You could make an $sth for too many placeholders, and pad the arguments with undef. That would be suitable for a non-null column.

    my $plch = join ', ', ('?') x @values; my $sql = <<END_SQL; SELECT COUNT(*) FROM some_table WHERE some_column IN ( $pclh ) END_SQL my $sth = $dbh->prepare_cached($sql) || die; $sth->execute(@values) || die; while ($sth->fetch) { # Do stuff } $sth->finish || die;

    After Compline,
    Zaxo

Re: DBI question with placeholders and IN
by dws (Chancellor) on Sep 30, 2002 at 22:15 UTC
    You need as many bind variables as you have values. Try the following (untested):
    my $bindvars = join(",", ("?") x scalar @values); my $sql = <<END_SQL; SELECT COUNT(*) FROM some_table WHERE some_column IN ($bindvars) END_SQL my $sth = $dbh->prepare_cached($sql) || die; $sth->execute(@values) || die; ...
Re: DBI question with placeholders and IN
by runrig (Abbot) on Sep 30, 2002 at 22:13 UTC
    my $values = join ",", @values; # This is 1 to N values

    Unless you have a fixed (or a small) 'N', it may not be worth using prepare_cached. You would have to have a '?' for every value, and use an array of values in the 'execute' like this:

    ...where some column in (" . join(',',('?') x @values) . ")" ... $sth->execute(@values) # one value for every '?' above
    For a large 'N' (or if this code is only executed once) don't use prepare_cached, and just 'prepare' every time.

      Ok. Cool. I know about the array of ? now.

      However, I am curious why you would just prepare and not prepare_cached. I do have a large N (in some cases, up to 999), but I will probably re-issue this query a whole bunch of times. (It's in a web application and this query forms the basis for a report that's generated many, many times with different values.) If I'm working with near-limitless cache, why shouldn't I prepare_cached everything?

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        I am curious why you would just prepare and not prepare_cached. I do have a large N (in some cases, up to 999), but I will probably re-issue this query a whole bunch of times. (It's in a web application...

        Is it a CGI app, or some persistent environment (mod_perl, etc.)? If its CGI and this query is only executed once per hit, then there is no benefit to prepare_cached. And even if you do execute this 1000 times per hit (or if this is mod_perl), with N being anywhere from 1 to 999, you will see little to no (and probably negative) benefit with (IMHO) too much cost.

        Every cursor you prepare and don't release consumes resources (memory, etc.), and you may run out. (With Oracle and some other DB's) there is a limit on the number of active statement handles you can have per session and I'm not sure if 999 is anywhere near that, but I think its a bad idea to use prepare_cached on a statement that dynamic. See this node for someone who ran out of memory (its not quite the same since he wasn't even using placeholders, but its the most relevant node I can find).

        Update: Also, its not so much the size of N, but the number of different 'N's you'll be preparing. If you know, e.g., that N is always 996 <= N <= 999, and you'll be executing this statement 1000 times on every invocation of the script, then prepare_cached might be a good thing in that case since it only has to cache up to 4 different statements...

Re: DBI question with placeholders and IN
by the_Don (Scribe) on Sep 30, 2002 at 22:08 UTC
    The $values variable will be inserted into the statement where the '?' is located. More than likely your join command is not creating a valid Oracle statement within the IN command. Actually, I do not see 'IN' as a valid construct within a SELECT statement.

    Here is the cite I use to get my Oracle information (because the Oracle site really ticks me off).

    the_Don
    ...making offers others can't rufuse.

Re: DBI question with placeholders and IN
by Anonymous Monk on Oct 01, 2002 at 20:16 UTC

    Snippet from perldoc DBI:

    Also, placeholders can only represent single scalar
    values.  For example, the following statement won't
    work as expected for more than one value:
    
    "SELECT name, age FROM people WHERE name IN (?)"    # wrong

    If you only need a count, you can always do

    my %uniq = ();
    ++$uniq{$_} for @values;
    $count = 0 + keys %uniq;

    But you need to be absolutely sure all the @values exist in your table (and to be 100% sure, you must check)

    There's little you can do. I'd use a direct prepare() and an execute() each time. If you manage to pass multiple scalars with a single ?, it wouldn't be portable anyway.

    --
    my $twocents;
Re: DBI question with placeholders and IN
by zengargoyle (Deacon) on Oct 02, 2002 at 09:27 UTC

    i once wanted to to this:

    SELECT events.b_g FROM events WHERE events.b_g NOT IN ( SELECT otf.b_g FROM otf ) AND events.b_g NOT IN ( SELECT requests.b_g FROM requests)

    and this...

    ... WHERE events.b_g IN ( SELECT otf.b_g FROM otf ) OR events.b_g IN ( SELECT requests.b_g FROM requests ) ...

    instead i had to do this:

    SELECT events.b_g, events.vector, SUM(tries) AS 'attempts', COUNT(DISTINCT events.observer) AS 'reporters' FROM events LEFT JOIN otf ON events.b_g = otf.b_g LEFT JOIN requests ON events.b_g = requests.b_g WHERE requests.b_g IS NULL AND otf.b_g IS NULL GROUP BY events.b_g, events.vector ORDER BY events.b_g, events.vector, 'attempts'

    and...

    ... WHERE requests.b_g = events.b_g OR otf.b_g = events.b_g ...

    this might or might not help.

      If you were using Oracle, then the code you wanted to run should have worked as-is. You could also use UNION in a sub-select like this:

      SELECT events.b_g FROM events WHERE events.b_g NOT IN ((SELECT otf.b_g FROM otf) UNION (SELECT requests.b_g FROM requests))
      That shouldn't be any faster, but the logic is cleaner IMHO.

      I agree with you that there are alternatives to using IN with literals (or placeholders). All of those alternatives require the values you want to test to be in the database though.

      If you ever find code using lots of similar IN queries, sometimes it's worth creating a temporary table to cache values for your IN clauses. It works like this:

      # insert all of the values you want to test with IN INSERT INTO temp (id, value) VALUES (?, ?); # now replace the placeholder IN with a sub-select SELECT ... WHERE column IN (SELECT value FROM temp WHERE id=?); # after you're completely done, remove the values -- you # might just rollback instead of deleting if the inserts # were never commited. DELETE FROM temp where id=?;
      I use code very similar to this when a user logs into our application. Lots of multi-valued security and preference checks can be flattened out into a temporary table. The IN statements become much simpler and more robust (you don't have to worry about the query blowing up if you have too many placeholders).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://201863]
Approved by ignatz
Front-paged by htoug
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-19 14:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found