Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^4: same query, different execution, different performance (substr)

by tye (Sage)
on Feb 14, 2012 at 17:58 UTC ( [id://953738]=note: print w/replies, xml ) Need Help??


in reply to Re^3: same query, different execution, different performance
in thread same query, different execution, different performance

No, it just depends on the query optimizer. Some databases have query optimizers that know how to use an index when told "LIKE 'blah%'". Some database have query optimizers that know how to use an index when told the equivalent thing using SUBSTR(). Some databases have optimizers that know how to do both. Some neither.

- tye        

Replies are listed 'Best First'.
Re^5: same query, different execution, different performance (substr)
by runrig (Abbot) on Feb 14, 2012 at 18:38 UTC
    Some databases have query optimizers that know how to use an index when told "LIKE 'blah%'".

    I think most query optimizers will know to use an index on "LIKE 'blah%'", but not when the query plan is determined at prepare time (for those sorts of databases), and the query optimizer is told "LIKE ?" and only later is given the argument "blah%".

    Some database have query optimizers that know how to use an index when told the equivalent thing using SUBSTR().

    A quick test with Oracle (update: and Sybase, and from what I recall, Informix) seems to show that it doesn't know to use an index with SUBSTR, and some quick googling on Postgres seems to imply that a function based index would be required there also. I'm not saying there's no database smart enough to use a regular index on a column for a substring search, I just haven't seen it yet.

      (Straying a bit from the OP, but this is fun, no? And anyway, I rather expect this may be interesting/useful for him too).

      An alternative (in postgresql) would be to use regex-index, which can be used when the submitted search-string or regex is anchored:

      select count(*) from azjunk6; -- 1 million rows random data: count --------- 1000000 (1 row) -- without index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 1147.420 ms -- now make a text_pattern_ops index: create index azjunk6_text_pattern_ops_idx on azjunk6 (txt text_patter +n_ops); Time: 7282.579 ms -- with index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 12.524 ms --> 100x faster

      (It can be handy to have both a 'normal' btree index *and* such a text_pattern_ops regex index.)

      See also: PostgreSQL index opclasses

      You can get another interesting indextype from pg_trgm, a postgresql extension. This will give you not indexed regexen but indexed trigrams: PostgreSQL pg_trgm extension. (disadvantage: large index-size)

      And FWIW: in 9.2devel, there is work ongoing to make it possible to combine the two: regexed trigram indexes...

        That's interesting. I've never cared much for the limited LIKE operator. Now when you prepare this:
        select * from azjunk6 where txt ~ ?
        Will it use the index when you execute the statement (e.g. with '^car[sz]')? My guess would be 'yes', and this might be the direction the OP should go.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://953738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-03-19 04:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found