Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^5: same query, different execution, different performance (substr)

by runrig (Abbot)
on Feb 14, 2012 at 18:38 UTC ( #953742=note: print w/ replies, xml ) Need Help??


in reply to Re^4: same query, different execution, different performance (substr)
in thread same query, different execution, different performance

Some databases have query optimizers that know how to use an index when told "LIKE 'blah%'".

I think most query optimizers will know to use an index on "LIKE 'blah%'", but not when the query plan is determined at prepare time (for those sorts of databases), and the query optimizer is told "LIKE ?" and only later is given the argument "blah%".

Some database have query optimizers that know how to use an index when told the equivalent thing using SUBSTR().

A quick test with Oracle (update: and Sybase, and from what I recall, Informix) seems to show that it doesn't know to use an index with SUBSTR, and some quick googling on Postgres seems to imply that a function based index would be required there also. I'm not saying there's no database smart enough to use a regular index on a column for a substring search, I just haven't seen it yet.


Comment on Re^5: same query, different execution, different performance (substr)
Re^6: same query, different execution, different performance (Pg's text_pattern_ops)
by erix (Vicar) on Feb 14, 2012 at 21:00 UTC

    (Straying a bit from the OP, but this is fun, no? And anyway, I rather expect this may be interesting/useful for him too).

    An alternative (in postgresql) would be to use regex-index, which can be used when the submitted search-string or regex is anchored:

    select count(*) from azjunk6; -- 1 million rows random data: count --------- 1000000 (1 row) -- without index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 1147.420 ms -- now make a text_pattern_ops index: create index azjunk6_text_pattern_ops_idx on azjunk6 (txt text_patter +n_ops); Time: 7282.579 ms -- with index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 12.524 ms --> 100x faster

    (It can be handy to have both a 'normal' btree index *and* such a text_pattern_ops regex index.)

    See also: PostgreSQL index opclasses

    You can get another interesting indextype from pg_trgm, a postgresql extension. This will give you not indexed regexen but indexed trigrams: PostgreSQL pg_trgm extension. (disadvantage: large index-size)

    And FWIW: in 9.2devel, there is work ongoing to make it possible to combine the two: regexed trigram indexes...

      That's interesting. I've never cared much for the limited LIKE operator. Now when you prepare this:
      select * from azjunk6 where txt ~ ?
      Will it use the index when you execute the statement (e.g. with '^car[sz]')? My guess would be 'yes', and this might be the direction the OP should go.

        Yes, with my table there seems to be no problem, the value of pg_server_prepare makes no difference for this data here. But tye is right, with certain data you need to disable it (i.e., { pg_server_prepare => 0} ).

        perl -MDBI -e 'my$d=DBI->connect("dbi:Pg:",undef,undef,{pg_server_prep +are => 0 }); my $p = "^car[sz]"; my $sql = "explain analyze select * from azjunk6 where txt ~ ?"; my $s = $d->prepare($sql) or die "ouch1"; my $r = $s->execute($p) or die "ouch2"; while (my$r=$s->fetchrow_arrayref) { print $r->[0], "\n"; }' -- output: Index Only Scan using azjunk6_text_pattern_ops_idx on azjunk6 (cost=0 +.00..13.85 rows=100 width=81) (actual time=0.485..0.659 rows=6 loops= +1) Index Cond: ((txt ~>=~ 'car'::text) AND (txt ~<~ 'cas'::text)) Filter: (txt ~ '^car[sz]'::text) Rows Removed by Filter: 35 Heap Fetches: 41 Total runtime: 0.741 ms

        In this case, the plan is the same whether pg_server_prepare be 1 or 0.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://953742]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2014-11-28 01:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (191 votes), past polls