http://www.perlmonks.org?node_id=1008081

menth0l has asked for the wisdom of the Perl Monks concerning the following question:

I have a sqlite table that has two columns: COL_A and COL_B (both integer). Now i want to select some records where col_a has values from one large (hundreds, maybe few thousands) set and col_b has values from another large set:
SELECT * FROM my_table WHERE col_1 IN (<large set of integers>) AND col_2 IN (<another large set of integers>)

This will result in error:
DBD::SQLite::db prepare_cached failed: too many SQL variables


My attempt to go around this problem resulted in this code.
use List::MoreUtils qw/natatime/; sub select_in_chunks { my ($self, $set_1, $set_2) = @_; my @ret; my $chunk_size = 450; my $it1 = natatime $chunk_size, @$set_1; while (my @a = $it1->()) { my $it2 = natatime $chunk_size, @$set_2; while (my @b = $it2->()) { # previous query with subsets my $records = $self->query(\@a, \@b) push @ret, @$records; } } return \@ret; }

This actually works but i wonder if there is a more convenient (and faster!) way to do selects with large IN statements. I can't use BETWEEN operator since those values aren't continuous.

Any ideas?

Replies are listed 'Best First'.
Re: SQLite and large number of parameters
by bart (Canon) on Dec 10, 2012 at 11:53 UTC
    Just where do you get those values from? My guess is you don't just make them up, and the user didn't enter them manually, so probably they're coming from somewhere in the database. So probably you can get the whole list using a reasonably simple query.

    If at all possible, why not use a subquery, like:

    SELECT * FROM my_table WHERE col_1 IN (select id from something) AND col_2 IN (select id from something_else)
    where something and something_else represent the query you used to get at that list.

    If it's not that simple, at worst you can first create temporary table with the values you're looking for in one column.

    p.s. It's possible that using an inner join, even on a subselect, is faster. Just test it.

    SELECT * FROM my_table INNER JOIN (select col_1 from something) A USING (col_1) INNER JOIN (select col_2 from something_else) B USING (col_2)

    (n.b. "USING(col1,col2)" is like "ON A.col_1=B.col_1 AND A.col_2 = B.col_2" except the "*" will pick up the column name(s) only once.)

      I'm afraid it's more complicated than that... These values come from bk-tree (i search it for similar strings) and their number varying from few to couple of thousands values.

        I agree with bart. I don't think it's as complicated a problem as you're making it. You have a relational database. You have a problem that is trivially solved using a relational database. Just INSERT the values INTO temporary tables and either use WHERE EXISTS (if SQLite supports it) or INNER JOIN on the tables instead. Don't knock yourself out trying to work around the limitations of the WHERE … IN clause. It simply doesn't scale to your requirements.

        Jim

        Then you are going to have to develop some kind of algorithm. Perhaps you could stuff those "couple thousand values" into a temporary table and then execute an INNER JOIN against it. Like it or not, you are forced to construct a different approach to your problem.
Re: SQLite and large number of parameters
by moritz (Cardinal) on Dec 10, 2012 at 11:53 UTC

    Not sure if it's faster, but you could always try not using placeholders:

    my $sql = sprintf q[ SELECT * FROM my_table WHERE col_1 IN (%s) AND col_2 IN (%s) ], join(', ', @$set_1), join(', ', @$set_2);

    If you can't be sure that they are all integers, be sure to map then through $dbh->quote first.

      Thanks for suggestion, I will try that.