Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^6: same query, different execution, different performance (Pg's text_pattern_ops)

by erix (Vicar)
on Feb 14, 2012 at 21:00 UTC ( #953766=note: print w/ replies, xml ) Need Help??


in reply to Re^5: same query, different execution, different performance (substr)
in thread same query, different execution, different performance

(Straying a bit from the OP, but this is fun, no? And anyway, I rather expect this may be interesting/useful for him too).

An alternative (in postgresql) would be to use regex-index, which can be used when the submitted search-string or regex is anchored:

select count(*) from azjunk6; -- 1 million rows random data: count --------- 1000000 (1 row) -- without index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 1147.420 ms -- now make a text_pattern_ops index: create index azjunk6_text_pattern_ops_idx on azjunk6 (txt text_patter +n_ops); Time: 7282.579 ms -- with index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 12.524 ms --> 100x faster

(It can be handy to have both a 'normal' btree index *and* such a text_pattern_ops regex index.)

See also: PostgreSQL index opclasses

You can get another interesting indextype from pg_trgm, a postgresql extension. This will give you not indexed regexen but indexed trigrams: PostgreSQL pg_trgm extension. (disadvantage: large index-size)

And FWIW: in 9.2devel, there is work ongoing to make it possible to combine the two: regexed trigram indexes...


Comment on Re^6: same query, different execution, different performance (Pg's text_pattern_ops)
Download Code
Re^7: same query, different execution, different performance (substr)
by runrig (Abbot) on Feb 14, 2012 at 21:25 UTC
    That's interesting. I've never cared much for the limited LIKE operator. Now when you prepare this:
    select * from azjunk6 where txt ~ ?
    Will it use the index when you execute the statement (e.g. with '^car[sz]')? My guess would be 'yes', and this might be the direction the OP should go.

      Yes, with my table there seems to be no problem, the value of pg_server_prepare makes no difference for this data here. But tye is right, with certain data you need to disable it (i.e., { pg_server_prepare => 0} ).

      perl -MDBI -e 'my$d=DBI->connect("dbi:Pg:",undef,undef,{pg_server_prep +are => 0 }); my $p = "^car[sz]"; my $sql = "explain analyze select * from azjunk6 where txt ~ ?"; my $s = $d->prepare($sql) or die "ouch1"; my $r = $s->execute($p) or die "ouch2"; while (my$r=$s->fetchrow_arrayref) { print $r->[0], "\n"; }' -- output: Index Only Scan using azjunk6_text_pattern_ops_idx on azjunk6 (cost=0 +.00..13.85 rows=100 width=81) (actual time=0.485..0.659 rows=6 loops= +1) Index Cond: ((txt ~>=~ 'car'::text) AND (txt ~<~ 'cas'::text)) Filter: (txt ~ '^car[sz]'::text) Rows Removed by Filter: 35 Heap Fetches: 41 Total runtime: 0.741 ms

      In this case, the plan is the same whether pg_server_prepare be 1 or 0.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://953766]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2015-07-06 10:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (71 votes), past polls