Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Tie Hash

by BioLion (Curate)
on Nov 30, 2009 at 14:32 UTC ( [id://810181]=perlquestion: print w/replies, xml ) Need Help??

BioLion has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am looking to reduce the startup time and also the memory impact of an application which uses a large hash of hashes to hold data.

One obvious way occurred to me and that was to tie the hash to a file / database, so that i wouldn't have to repopulate it with every run of the program and also that it wouldn't be held in memory. Unfortunately I don't have a lot of experience in these matters.

Looking around CPAN and the monastery, folks seemed very keen on three main candidates:

  • Tie::DBI
      Tie hashes to SQL databases
  • DBM::Deep (Re: Can I tie a hash of hashes to a file?)
      "A unique flat-file database module, written in pure perl"
  • BerkleyDB::Hash (Re: Managing a graph with large number of nodes)
      BerkelyDB based obv.!
  • I wanted to write some benchmarks to test which was the fastest, most space efficient, and simplest to use, but never really got very far:
  • Tie::DBI
      You must pre-create the database, using DBI
      While you can subsequently use the tied hash like a normal hash, doing the tie is fairly complex and requires messing around with flags
  • BerkeleyDB::Hash
      Again, you must pre-create the database and mess around with a lot of setup flags, just to get it working...
  • I was put off, because I just wanted a simple 'plug and play' solution... However:
  • DBM::Deep
      Very simple interface
      "True multi-level hash/array support (unlike MLDBM, which is faked), hybrid OO / tie() interface, cross-platform FTPable files, ACID transactions, and is quite fast. Can handle millions of keys and unlimited levels without significant slow-down. Written from the ground-up in pure perl -- this is NOT a wrapper around a C-based DBM. Out-of-the-box compatibility with Unix, Mac OS X and Windows."
      Relatively simple flags (but more if you want to go deeper):
      tie %hash, "DBM::Deep", { file => "foo.db", locking => 1, autoflush => 1 };
  • So my questions:
  • am I being stupid / impatient / missing a serious performance perk / lazy / etc... by discounting everything but DBM::Deep?
  • Are there any other other approaches I should consider, which offer the same advantages of DBM::Deep, but do it in a sufficiently different way to warrant Benchmarking?
  • And for those that have experience of it - are there any pitfalls i should be aware of from DBM::Deep?
  • Finally - am I even on the right track to solving my original problem (see 500 lines above...)!?
  • Sorry for the long question and thanks in advance!

    Just a something something...

    Replies are listed 'Best First'.
    Re: Tie Hash
    by zentara (Archbishop) on Nov 30, 2009 at 14:42 UTC
      .... there is also Storable.... ..... you might want to consider thread safety in whatever module you select......just in case you decide to use threads to speed up processing

      check out this for example.... the first script creates the hash file, the second retreives it from file

      #!/usr/bin/perl # the storer use strict; use warnings; use Data::Dumper; use Storable; my (%kids_of_wife, $man, $wife); $kids_of_wife{"Jacob"} = { "Leah" => ["Reuben", "Simeon", "Levi", "Judah", "Issachar", "Zebulun +"], "Rachel" => ["Joseph", "Benjamin"], "Bilhah" => ["Dan", "Naphtali"], "Zilpah" => ["Gad", "Asher"], }; $kids_of_wife{"Bill"} = { "Betty" => ["Bob", "Willy", "Fred", "Bilbo", "Frodo", "Dimwitia"], "Joan" => ["Mike", "Ben"], "Harriet" => ["Danny", "Hondo"], "Mary" => ["Marion", "Egad", "Clancy"], }; store(\%kids_of_wife,"zzwifehash"); print '################################################',"\n"; foreach (keys %kids_of_wife) { print $_,"\n"; } print '################################################',"\n"; foreach (keys %kids_of_wife) { print foreach (keys %{$kids_of_wife{$_}}),"\n"; } print '################################################',"\n"; foreach (keys %kids_of_wife) { my $man = $_; foreach (keys %{$kids_of_wife{$man}}){; print $_,"\n"; my $wife = $_; print @{$kids_of_wife{$man}{$wife}},"\n"; }} print '################################################',"\n"; foreach (keys %kids_of_wife) { $man = $_; foreach (keys %{$kids_of_wife{$man}}){; $wife = $_; print "$man + $wife = "; print "@{$kids_of_wife{$man}{$wife}}\n"; }} print '################################################',"\n"; #print Dumper(%kids_of_wife);
      and a retreive example
      #!/usr/bin/perl # the retreiver use strict; use warnings; use Data::Dumper; use Storable; my (%kids_of_wife,$man,$wife); #$href = retrieve("zzwifehash"); # by ref %kids_of_wife = %{retrieve('zzwifehash')}; # direct to hash print '################################################',"\n"; foreach (keys %kids_of_wife) { print $_,"\n"; } print '################################################',"\n"; foreach (keys %kids_of_wife) { print foreach (keys %{$kids_of_wife{$_}}),"\n"; } print '################################################',"\n"; foreach (keys %kids_of_wife) { my $man = $_; foreach (keys %{$kids_of_wife{$man}}){; print $_,"\n"; my $wife = $_; print @{$kids_of_wife{$man}{$wife}},"\n"; }} print '################################################',"\n"; foreach (keys %kids_of_wife) { $man = $_; foreach (keys %{$kids_of_wife{$man}}){; $wife = $_; print "$man + $wife = "; print "@{$kids_of_wife{$man}{$wife}}\n"; }} print '################################################',"\n"; #print Dumper(%kids_of_wife);

      I'm not really a human, but I play one on earth.
      Old Perl Programmer Haiku

        Thanks - You make a good point - again DBM::Deep seems to do well, all I need to do is specify

        my $db = DBM::Deep->new( file => "foo.db", locking => 1 );
        and the locking is taken care of for me (including shared for reading).

        Just a something something...
    Re: Tie Hash
    by bellaire (Hermit) on Nov 30, 2009 at 16:02 UTC
      I'm not sure I understand your objection to pre-loading the databases. If you want to improve on startup time over in-memory hashes, then you're going to have to have the data somewhere else first, which means you have to pre-load the data.

      If you aren't doing that, then you are populating your data to memory first before it gets put into your hashes, and you aren't getting an initial startup time benefit. That's okay, since you'll get the benefit on subsequent uses, but there's also no reason you couldn't implement such a solution using e.g. Tie::DBI rather than pre-loading the data. After all, a tied hash that you couldn't add to wouldn't be very useful.

      In short, it seems all of your options are interchangable depending on how you implement them. The difference is in how you want to write your implementation.

        Sorry - maybe i didn't make that clear - Tie::DBI, ties your datastructure to a pre-existing database, i.e. you need to make that database first, then populate from your data. DBM::Deep you can just write and it will do all the creation for you. Once the *populated* database is created, I won't re-create it, for the reasons you pointed out!

        As I said I am trying to learn more about databases, so if i am wrong, please correct me!

        Just a something something...
    Re: Tie Hash
    by tsee (Curate) on Dec 02, 2009 at 09:04 UTC

      Last time I used DBM::Deep, it was everything, but NOT fast. It's easy to use and powerful. It does persistent multi-level structures in pure Perl. It has a responsive author. It's great software. But it's seriously not very fast. Make sure you do some proper benchmarking before deciding to use it all out. If the benchmarks show it's okay for your use case, then I highly recommend it.

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: perlquestion [id://810181]
    Approved by Corion
    Front-paged by Old_Gray_Bear
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others avoiding work at the Monastery: (6)
    As of 2024-04-23 09:22 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found