Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

SQLite and large number of parameters

by menth0l (Monk)
on Dec 10, 2012 at 10:14 UTC ( #1008081=perlquestion: print w/ replies, xml ) Need Help??
menth0l has asked for the wisdom of the Perl Monks concerning the following question:

I have a sqlite table that has two columns: COL_A and COL_B (both integer). Now i want to select some records where col_a has values from one large (hundreds, maybe few thousands) set and col_b has values from another large set:
SELECT * FROM my_table WHERE col_1 IN (<large set of integers>) AND col_2 IN (<another large set of integers>)

This will result in error:
DBD::SQLite::db prepare_cached failed: too many SQL variables


My attempt to go around this problem resulted in this code.
use List::MoreUtils qw/natatime/; sub select_in_chunks { my ($self, $set_1, $set_2) = @_; my @ret; my $chunk_size = 450; my $it1 = natatime $chunk_size, @$set_1; while (my @a = $it1->()) { my $it2 = natatime $chunk_size, @$set_2; while (my @b = $it2->()) { # previous query with subsets my $records = $self->query(\@a, \@b) push @ret, @$records; } } return \@ret; }

This actually works but i wonder if there is a more convenient (and faster!) way to do selects with large IN statements. I can't use BETWEEN operator since those values aren't continuous.

Any ideas?

Comment on SQLite and large number of parameters
Select or Download Code
Re: SQLite and large number of parameters
by moritz (Cardinal) on Dec 10, 2012 at 11:53 UTC

    Not sure if it's faster, but you could always try not using placeholders:

    my $sql = sprintf q[ SELECT * FROM my_table WHERE col_1 IN (%s) AND col_2 IN (%s) ], join(', ', @$set_1), join(', ', @$set_2);

    If you can't be sure that they are all integers, be sure to map then through $dbh->quote first.

      Thanks for suggestion, I will try that.
Re: SQLite and large number of parameters
by bart (Canon) on Dec 10, 2012 at 11:53 UTC
    Just where do you get those values from? My guess is you don't just make them up, and the user didn't enter them manually, so probably they're coming from somewhere in the database. So probably you can get the whole list using a reasonably simple query.

    If at all possible, why not use a subquery, like:

    SELECT * FROM my_table WHERE col_1 IN (select id from something) AND col_2 IN (select id from something_else)
    where something and something_else represent the query you used to get at that list.

    If it's not that simple, at worst you can first create temporary table with the values you're looking for in one column.

    p.s. It's possible that using an inner join, even on a subselect, is faster. Just test it.

    SELECT * FROM my_table INNER JOIN (select col_1 from something) A USING (col_1) INNER JOIN (select col_2 from something_else) B USING (col_2)

    (n.b. "USING(col1,col2)" is like "ON A.col_1=B.col_1 AND A.col_2 = B.col_2" except the "*" will pick up the column name(s) only once.)

      I'm afraid it's more complicated than that... These values come from bk-tree (i search it for similar strings) and their number varying from few to couple of thousands values.
        Then you are going to have to develop some kind of algorithm. Perhaps you could stuff those "couple thousand values" into a temporary table and then execute an INNER JOIN against it. Like it or not, you are forced to construct a different approach to your problem.

        I agree with bart. I don't think it's as complicated a problem as you're making it. You have a relational database. You have a problem that is trivially solved using a relational database. Just INSERT the values INTO temporary tables and either use WHERE EXISTS (if SQLite supports it) or INNER JOIN on the tables instead. Don't knock yourself out trying to work around the limitations of the WHERE … IN clause. It simply doesn't scale to your requirements.

        Jim

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1008081]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (13)
As of 2014-07-25 17:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (174 votes), past polls