Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^6: same query, different execution, different performance (Pg's text_pattern_ops)

by erix (Prior)
on Feb 14, 2012 at 21:00 UTC ( [id://953766]=note: print w/replies, xml ) Need Help??


in reply to Re^5: same query, different execution, different performance (substr)
in thread same query, different execution, different performance

(Straying a bit from the OP, but this is fun, no? And anyway, I rather expect this may be interesting/useful for him too).

An alternative (in postgresql) would be to use regex-index, which can be used when the submitted search-string or regex is anchored:

select count(*) from azjunk6; -- 1 million rows random data: count --------- 1000000 (1 row) -- without index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 1147.420 ms -- now make a text_pattern_ops index: create index azjunk6_text_pattern_ops_idx on azjunk6 (txt text_patter +n_ops); Time: 7282.579 ms -- with index: select * from azjunk6 where txt ~ '^car[sz]'; txt + ---------------------------------------------------------------------- +------------ carsxbutsvamedynximrftmimgtzirtuorik lunamb qpjvwmixlxpmcu mm rzotj +jnfxr syfrj carzfhndjznvpgcpwqb fp bqpljspqqpzfbbswefzs pjoocqztqkjxyvbr qalcfzme +bezz ftmyi carziicmi zzzvt beqsupgdwkhdg luvvmhhay bj b r soaiyfftiqgq hs brdzaf +dztmtvfvrdn carziogaizohcqcphs ksucyeod q yvfallob pctvmwplm igzsqalyy dqsjpiikx +wyyxesenbeq carzw rcfwlqcweao jzeyxkchgc g vyvujtbsbeiewj inuelmldsa mpjevzmo pc +pwi kfajug carzxrk qyk palimcwokbw hbdcsmxehcsnrop prrokygyi ssngegzksrzvged cu +oxr yozt ca (6 rows) Time: 12.524 ms --> 100x faster

(It can be handy to have both a 'normal' btree index *and* such a text_pattern_ops regex index.)

See also: PostgreSQL index opclasses

You can get another interesting indextype from pg_trgm, a postgresql extension. This will give you not indexed regexen but indexed trigrams: PostgreSQL pg_trgm extension. (disadvantage: large index-size)

And FWIW: in 9.2devel, there is work ongoing to make it possible to combine the two: regexed trigram indexes...

  • Comment on Re^6: same query, different execution, different performance (Pg's text_pattern_ops)
  • Download Code

Replies are listed 'Best First'.
Re^7: same query, different execution, different performance (substr)
by runrig (Abbot) on Feb 14, 2012 at 21:25 UTC
    That's interesting. I've never cared much for the limited LIKE operator. Now when you prepare this:
    select * from azjunk6 where txt ~ ?
    Will it use the index when you execute the statement (e.g. with '^car[sz]')? My guess would be 'yes', and this might be the direction the OP should go.

      Yes, with my table there seems to be no problem, the value of pg_server_prepare makes no difference for this data here. But tye is right, with certain data you need to disable it (i.e., { pg_server_prepare => 0} ).

      perl -MDBI -e 'my$d=DBI->connect("dbi:Pg:",undef,undef,{pg_server_prep +are => 0 }); my $p = "^car[sz]"; my $sql = "explain analyze select * from azjunk6 where txt ~ ?"; my $s = $d->prepare($sql) or die "ouch1"; my $r = $s->execute($p) or die "ouch2"; while (my$r=$s->fetchrow_arrayref) { print $r->[0], "\n"; }' -- output: Index Only Scan using azjunk6_text_pattern_ops_idx on azjunk6 (cost=0 +.00..13.85 rows=100 width=81) (actual time=0.485..0.659 rows=6 loops= +1) Index Cond: ((txt ~>=~ 'car'::text) AND (txt ~<~ 'cas'::text)) Filter: (txt ~ '^car[sz]'::text) Rows Removed by Filter: 35 Heap Fetches: 41 Total runtime: 0.741 ms

      In this case, the plan is the same whether pg_server_prepare be 1 or 0.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://953766]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-24 22:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found