http://www.perlmonks.org?node_id=977465


in reply to Re^2: Inserting Hash Information Into MySQL
in thread Inserting Hash Information Into MySQL

Thank you for the information, that is really interesting. I wonder how it integrates with SQL.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics
  • Comment on Re^3: Inserting Hash Information Into MySQL

Replies are listed 'Best First'.
Re^4: Inserting Hash Information Into MySQL (PostgreSQL's hstore)
by erix (Prior) on Jun 23, 2012 at 21:40 UTC

    (PostgreSQL:) Once you have a table 't' with a column 'hs' of type hstore, you can use statements like:

    select * from t where hs ? 'x';

    to do 'hash-lookups' (but there are many more hstore operators).

    Below is a test program in bash/sql. It creates a table with an hstore with a million entries and indexes it.

    Then a few searches to show the retrieval performance; same search repeated three times without index and then repeated three times with index:

    $ pg_sql/pgsql.HEAD/hstore/create_hstore.sh rowcount | size ----------+-------- 1000000 | 249 MB (1 row) Time: 183.026 ms Time: 184.673 ms Time: 183.903 ms Time: 0.295 ms Time: 0.245 ms Time: 0.274 ms

    You see that the speedup is quite considerable :)

    Here is that create_hstore.sh program:

    (You must have hstore installed - see 'contrib' in older postgres or extensions in newer Pg's)

    #!/bin/sh schema=public table=testhstore t=$schema.$table echo " --/* drop table if exists $t; create table $t ( hs hstore ); insert into $t select hstore(md5(cast(f.x as text)), cast(f.x as text)) from generate_series(1, 1000000) as f(x); create index ${table}_hs_idx on $t using gin (hs); analyze $t; --*/ select count(*) as rowcount, pg_size_pretty(pg_total_relation_size( '$ +{t}' )) as size from $t; " | psql -q v=9509342c6a6b283d07a3ce406b06eb1e # discard one run for cache effects, then run three times, no index: echo "set enable_bitmapscan=0; \o /dev/null select * from $t where hs ? '$v'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:' # discard one run for cache effects, then run three times with index: echo "set enable_bitmapscan=1; \o /dev/null select * from $t where hs ? 'x'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:'

    Obviously we're far away from standard SQL here. Also from the OP's question, I think, but since you expressed interest I thought I'd construct an example. As always, there is much more in the Fine Manual: http://www.postgresql.org/docs/current/static/hstore.html.

    (run on a pretty average 8120 desktop, slow single SATA disk, Centos 6.2 Linux, postgresql 9.3devel)

    update: Changed to /dev/null redirection with \o inside a psql session. Also changed Timings from EXPLAIN ANALYZE to those from psql's '\timing on' (so I'm showing the slowest);

      Hm. I'm not sure that doing 4 lookups of the same value from the same process and then discarding the first one is much of a demonstration.

      It is effectively the same thing as doing this:

      c:\test> perl -MDigest::MD5=md5_hex -E"printf qq[%s %07d\n], md5_hex($ +_),$_ for 1..1e6" >hstore c:\test > type hstore.pl #! perl -slw use strict; use Time::HiRes qw[ time ]; use Digest::MD5 qw[ md5_hex ]; my $table = do{ local( @ARGV, $/ ) = 'hstore'; <> }; my %cache; my $v = '9509342c6a6b283d07a3ce406b06eb1e'; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; for ( 1 .. 3 ) { my $start = time; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; printf "Time: %.9f\n", time() - $start; } c:\test> hstore Time: 0.000007868 Time: 0.000009060 Time: 0.000001907

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?