Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re^4: Inserting Hash Information Into MySQL (PostgreSQL's hstore)

by erix (Vicar)
on Jun 23, 2012 at 21:40 UTC ( #978017=note: print w/replies, xml ) Need Help??

in reply to Re^3: Inserting Hash Information Into MySQL
in thread Inserting Hash Information Into MySQL

(PostgreSQL:) Once you have a table 't' with a column 'hs' of type hstore, you can use statements like:

select * from t where hs ? 'x';

to do 'hash-lookups' (but there are many more hstore operators).

Below is a test program in bash/sql. It creates a table with an hstore with a million entries and indexes it.

Then a few searches to show the retrieval performance; same search repeated three times without index and then repeated three times with index:

$ pg_sql/pgsql.HEAD/hstore/ rowcount | size ----------+-------- 1000000 | 249 MB (1 row) Time: 183.026 ms Time: 184.673 ms Time: 183.903 ms Time: 0.295 ms Time: 0.245 ms Time: 0.274 ms

You see that the speedup is quite considerable :)

Here is that program:

(You must have hstore installed - see 'contrib' in older postgres or extensions in newer Pg's)

#!/bin/sh schema=public table=testhstore t=$schema.$table echo " --/* drop table if exists $t; create table $t ( hs hstore ); insert into $t select hstore(md5(cast(f.x as text)), cast(f.x as text)) from generate_series(1, 1000000) as f(x); create index ${table}_hs_idx on $t using gin (hs); analyze $t; --*/ select count(*) as rowcount, pg_size_pretty(pg_total_relation_size( '$ +{t}' )) as size from $t; " | psql -q v=9509342c6a6b283d07a3ce406b06eb1e # discard one run for cache effects, then run three times, no index: echo "set enable_bitmapscan=0; \o /dev/null select * from $t where hs ? '$v'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:' # discard one run for cache effects, then run three times with index: echo "set enable_bitmapscan=1; \o /dev/null select * from $t where hs ? 'x'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:'

Obviously we're far away from standard SQL here. Also from the OP's question, I think, but since you expressed interest I thought I'd construct an example. As always, there is much more in the Fine Manual:

(run on a pretty average 8120 desktop, slow single SATA disk, Centos 6.2 Linux, postgresql 9.3devel)

update: Changed to /dev/null redirection with \o inside a psql session. Also changed Timings from EXPLAIN ANALYZE to those from psql's '\timing on' (so I'm showing the slowest);

Replies are listed 'Best First'.
Re^5: Inserting Hash Information Into MySQL (PostgreSQL's hstore)
by BrowserUk (Pope) on Jun 24, 2012 at 06:25 UTC

    Hm. I'm not sure that doing 4 lookups of the same value from the same process and then discarding the first one is much of a demonstration.

    It is effectively the same thing as doing this:

    c:\test> perl -MDigest::MD5=md5_hex -E"printf qq[%s %07d\n], md5_hex($ +_),$_ for 1..1e6" >hstore c:\test > type #! perl -slw use strict; use Time::HiRes qw[ time ]; use Digest::MD5 qw[ md5_hex ]; my $table = do{ local( @ARGV, $/ ) = 'hstore'; <> }; my %cache; my $v = '9509342c6a6b283d07a3ce406b06eb1e'; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; for ( 1 .. 3 ) { my $start = time; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; printf "Time: %.9f\n", time() - $start; } c:\test> hstore Time: 0.000007868 Time: 0.000009060 Time: 0.000001907

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?