Re^5: same query, different execution, different performance (substr)

Some databases have query optimizers that know how to use an index when told "LIKE 'blah%'".

I think most query optimizers will know to use an index on "LIKE 'blah%'", but not when the query plan is determined at prepare time (for those sorts of databases), and the query optimizer is told "LIKE ?" and only later is given the argument "blah%".

Some database have query optimizers that know how to use an index when told the equivalent thing using SUBSTR().

A quick test with Oracle (update: and Sybase, and from what I recall, Informix) seems to show that it doesn't know to use an index with SUBSTR, and some quick googling on Postgres seems to imply that a function based index would be required there also. I'm not saying there's no database smart enough to use a regular index on a column for a substring search, I just haven't seen it yet.

Comment on Re^5: same query, different execution, different performance (substr)

Replies are listed 'Best First'.
Re^6: same query, different execution, different performance (Pg's text_pattern_ops) by erix (Prior) on Feb 14, 2012 at 21:00 UTC
(Straying a bit from the OP, but this is fun, no? And anyway, I rather expect this may be interesting/useful for him too). An alternative (in postgresql) would be to use regex-index, which can be used when the submitted search-string or regex is anchored: select count() from azjunk6; -- 1 million rows random data: count --------- 1000000 (1 row) -- without index: select from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 1147.420 ms -- now make a text_pattern_ops index: create index azjunk6_text_pattern_ops_idx on azjunk6 (txt text_patter +n_ops); Time: 7282.579 ms -- with index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 12.524 ms --> 100x faster [download] (It can be handy to have both a 'normal' btree index and such a text_pattern_ops regex index.) See also: PostgreSQL index opclasses You can get another interesting indextype from pg_trgm, a postgresql extension. This will give you not indexed regexen but indexed trigrams: PostgreSQL pg_trgm extension. (disadvantage: large index-size) And FWIW: in 9.2devel, there is work ongoing to make it possible to combine the two: regexed trigram indexes...	[reply] [d/l]
Re^7: same query, different execution, different performance (substr) by runrig (Abbot) on Feb 14, 2012 at 21:25 UTC
That's interesting. I've never cared much for the limited LIKE operator. Now when you prepare this: `select * from azjunk6 where txt ~ ?` [download] Will it use the index when you execute the statement (e.g. with '`^car[sz]`')? My guess would be 'yes', and this might be the direction the OP should go.	[reply] [d/l] [select]
Re^8: same query, different execution, different performance (substr) by erix (Prior) on Feb 14, 2012 at 21:47 UTC
Yes, with my table there seems to be no problem, the value of pg_server_prepare makes no difference for this data here. But tye is right, with certain data you need to disable it (i.e., { pg_server_prepare => 0} ). perl -MDBI -e 'my$d=DBI->connect("dbi:Pg:",undef,undef,{pg_server_prep +are => 0 }); my $p = "^car[sz]"; my $sql = "explain analyze select * from azjunk6 where txt ~ ?"; my $s = $d->prepare($sql) or die "ouch1"; my $r = $s->execute($p) or die "ouch2"; while (my$r=$s->fetchrow_arrayref) { print $r->[0], "\n"; }' -- output: Index Only Scan using azjunk6_text_pattern_ops_idx on azjunk6 (cost=0 +.00..13.85 rows=100 width=81) (actual time=0.485..0.659 rows=6 loops= +1) Index Cond: ((txt ~>=~ 'car'::text) AND (txt ~<~ 'cas'::text)) Filter: (txt ~ '^car[sz]'::text) Rows Removed by Filter: 35 Heap Fetches: 41 Total runtime: 0.741 ms [download] In this case, the plan is the same whether pg_server_prepare be 1 or 0.	[reply] [d/l]


Problems? Is your data what you think it is?
	PerlMonks