Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re^3: Inserting Hash Information Into MySQL

by CountZero (Bishop)
on Jun 20, 2012 at 20:05 UTC ( #977465=note: print w/replies, xml ) Need Help??

in reply to Re^2: Inserting Hash Information Into MySQL
in thread Inserting Hash Information Into MySQL

Thank you for the information, that is really interesting. I wonder how it integrates with SQL.


A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics
  • Comment on Re^3: Inserting Hash Information Into MySQL

Replies are listed 'Best First'.
Re^4: Inserting Hash Information Into MySQL (PostgreSQL's hstore)
by erix (Parson) on Jun 23, 2012 at 21:40 UTC

    (PostgreSQL:) Once you have a table 't' with a column 'hs' of type hstore, you can use statements like:

    select * from t where hs ? 'x';

    to do 'hash-lookups' (but there are many more hstore operators).

    Below is a test program in bash/sql. It creates a table with an hstore with a million entries and indexes it.

    Then a few searches to show the retrieval performance; same search repeated three times without index and then repeated three times with index:

    $ pg_sql/pgsql.HEAD/hstore/ rowcount | size ----------+-------- 1000000 | 249 MB (1 row) Time: 183.026 ms Time: 184.673 ms Time: 183.903 ms Time: 0.295 ms Time: 0.245 ms Time: 0.274 ms

    You see that the speedup is quite considerable :)

    Here is that program:

    (You must have hstore installed - see 'contrib' in older postgres or extensions in newer Pg's)

    #!/bin/sh schema=public table=testhstore t=$schema.$table echo " --/* drop table if exists $t; create table $t ( hs hstore ); insert into $t select hstore(md5(cast(f.x as text)), cast(f.x as text)) from generate_series(1, 1000000) as f(x); create index ${table}_hs_idx on $t using gin (hs); analyze $t; --*/ select count(*) as rowcount, pg_size_pretty(pg_total_relation_size( '$ +{t}' )) as size from $t; " | psql -q v=9509342c6a6b283d07a3ce406b06eb1e # discard one run for cache effects, then run three times, no index: echo "set enable_bitmapscan=0; \o /dev/null select * from $t where hs ? '$v'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:' # discard one run for cache effects, then run three times with index: echo "set enable_bitmapscan=1; \o /dev/null select * from $t where hs ? 'x'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:'

    Obviously we're far away from standard SQL here. Also from the OP's question, I think, but since you expressed interest I thought I'd construct an example. As always, there is much more in the Fine Manual:

    (run on a pretty average 8120 desktop, slow single SATA disk, Centos 6.2 Linux, postgresql 9.3devel)

    update: Changed to /dev/null redirection with \o inside a psql session. Also changed Timings from EXPLAIN ANALYZE to those from psql's '\timing on' (so I'm showing the slowest);

      Hm. I'm not sure that doing 4 lookups of the same value from the same process and then discarding the first one is much of a demonstration.

      It is effectively the same thing as doing this:

      c:\test> perl -MDigest::MD5=md5_hex -E"printf qq[%s %07d\n], md5_hex($ +_),$_ for 1..1e6" >hstore c:\test > type #! perl -slw use strict; use Time::HiRes qw[ time ]; use Digest::MD5 qw[ md5_hex ]; my $table = do{ local( @ARGV, $/ ) = 'hstore'; <> }; my %cache; my $v = '9509342c6a6b283d07a3ce406b06eb1e'; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; for ( 1 .. 3 ) { my $start = time; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; printf "Time: %.9f\n", time() - $start; } c:\test> hstore Time: 0.000007868 Time: 0.000009060 Time: 0.000001907

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://977465]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2018-10-22 19:27 GMT
Find Nodes?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...

    Results (122 votes). Check out past polls.