Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^2: Inserting Hash Information Into MySQL

by erix (Parson)
on Jun 20, 2012 at 17:07 UTC ( #977414=note: print w/replies, xml ) Need Help??

in reply to Re: Inserting Hash Information Into MySQL
in thread Inserting Hash Information Into MySQL

A database cannot save a Perl hash (and certainly not a hash of array-refs)in its native format.

A database should have no problem with a hash. For instance, PostgreSQL has hstore [1], which not only makes implementing a hash easy, but can also index them (with either GIN (speed of lookup) or GiST (speed of writing)).

You will have to review and revise your database schema

Perhaps revise not 'database schema' but just 'database'... ;-)


Replies are listed 'Best First'.
Re^3: Inserting Hash Information Into MySQL
by CountZero (Bishop) on Jun 20, 2012 at 20:05 UTC
    Thank you for the information, that is really interesting. I wonder how it integrates with SQL.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      (PostgreSQL:) Once you have a table 't' with a column 'hs' of type hstore, you can use statements like:

      select * from t where hs ? 'x';

      to do 'hash-lookups' (but there are many more hstore operators).

      Below is a test program in bash/sql. It creates a table with an hstore with a million entries and indexes it.

      Then a few searches to show the retrieval performance; same search repeated three times without index and then repeated three times with index:

      $ pg_sql/pgsql.HEAD/hstore/ rowcount | size ----------+-------- 1000000 | 249 MB (1 row) Time: 183.026 ms Time: 184.673 ms Time: 183.903 ms Time: 0.295 ms Time: 0.245 ms Time: 0.274 ms

      You see that the speedup is quite considerable :)

      Here is that program:

      (You must have hstore installed - see 'contrib' in older postgres or extensions in newer Pg's)

      #!/bin/sh schema=public table=testhstore t=$schema.$table echo " --/* drop table if exists $t; create table $t ( hs hstore ); insert into $t select hstore(md5(cast(f.x as text)), cast(f.x as text)) from generate_series(1, 1000000) as f(x); create index ${table}_hs_idx on $t using gin (hs); analyze $t; --*/ select count(*) as rowcount, pg_size_pretty(pg_total_relation_size( '$ +{t}' )) as size from $t; " | psql -q v=9509342c6a6b283d07a3ce406b06eb1e # discard one run for cache effects, then run three times, no index: echo "set enable_bitmapscan=0; \o /dev/null select * from $t where hs ? '$v'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:' # discard one run for cache effects, then run three times with index: echo "set enable_bitmapscan=1; \o /dev/null select * from $t where hs ? 'x'; \o \timing on select * from $t where hs ? '$v'; select * from $t where hs ? '$v'; select * from $t where hs ? '$v';" \ | psql -q | grep -E '^Time:'

      Obviously we're far away from standard SQL here. Also from the OP's question, I think, but since you expressed interest I thought I'd construct an example. As always, there is much more in the Fine Manual:

      (run on a pretty average 8120 desktop, slow single SATA disk, Centos 6.2 Linux, postgresql 9.3devel)

      update: Changed to /dev/null redirection with \o inside a psql session. Also changed Timings from EXPLAIN ANALYZE to those from psql's '\timing on' (so I'm showing the slowest);

        Hm. I'm not sure that doing 4 lookups of the same value from the same process and then discarding the first one is much of a demonstration.

        It is effectively the same thing as doing this:

        c:\test> perl -MDigest::MD5=md5_hex -E"printf qq[%s %07d\n], md5_hex($ +_),$_ for 1..1e6" >hstore c:\test > type #! perl -slw use strict; use Time::HiRes qw[ time ]; use Digest::MD5 qw[ md5_hex ]; my $table = do{ local( @ARGV, $/ ) = 'hstore'; <> }; my %cache; my $v = '9509342c6a6b283d07a3ce406b06eb1e'; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; for ( 1 .. 3 ) { my $start = time; my $val = ( $cache{ $v } ) //= $table =~ m[$v (\d+)$]; printf "Time: %.9f\n", time() - $start; } c:\test> hstore Time: 0.000007868 Time: 0.000009060 Time: 0.000001907

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://977414]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2018-11-17 00:39 GMT
Find Nodes?
    Voting Booth?
    My code is most likely broken because:

    Results (200 votes). Check out past polls.

    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!