http://www.perlmonks.org?node_id=163405

shaezi has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am attempting to work with a hash, but the amount of data that I'm loading into it is causing an out of memory error. The data is over 800mb.
I tried using tie, like:
use GDBM_File; my %data_parsed; tie %data_parsed, "GDBM_File", "$hashfil", O_RDWR|O_CREAT, 0666;
First of all I want to know if this is a good approach or not. Second, for the values of each key I'm referencing an array. When I try to get a value for example: $data_parsed{$hashkey}[$i]I get an error message like:Can't use string ("ARRAY(0x200e1dfc)") as an ARRAY ref while "strict refs" in use. So I was wondering what I was doing wrong. Any help would be appreciated. Thanks!

Replies are listed 'Best First'.
Re: Loading large amount of data in a hash
by maverick (Curate) on May 01, 2002 at 22:22 UTC
    Right idea. You just have a few little details. You can't natively store references in dbms. You'll have to serialize the arrayrefs down to scalars before you can store them, using something like Storable. Which leads to the second issue. If memory serves, GDBM has a fixed size for those scalars...Berkely BTrees do not. Try something like:
    use DB_File; use Storable; tie %data_parsed, "DB_File", "$hashfil", O_RDWR|O_CREAT, 0666; # inside read data loop $data_parsed{$key} = freeze($array_ref); # inside use data loop $array_ref = thaw($data_parsed{$key});
    DB_File is slow to create, but fast to read. That may or may not be an issue.

    HTH

    Update

    Ya know...after taking another glance at this, 800 is a lot of data. Plus, you have a key, and array combination. Perhaps it's time to move up to a full blown database like MySQL or Postgres?

    /\/\averick
    OmG! They killed tilly! You *bleep*!!

Re: Loading large amount of data in a hash
by strat (Canon) on May 01, 2002 at 23:16 UTC
    Hello,

    with big data in memory (e.g. hashes of hashes of lists), I prefer using perl 5.005_03 to perl 5.6 or 5.61, because it takes about 20% less memory and sometimes saves quite a lot of runtime.

    In a case I'm just working with, the statistics are about like the following:
    versioncpu-timememory
    5.005_03~ 25 minutes480 MB
    5.6.1~80 minutes~600 MB
    I've tested this behaviour under Solaris and Linux and Win2k.

    Best regards,
    perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"

Re: Loading large amount of data in a hash
by perrin (Chancellor) on May 01, 2002 at 23:00 UTC
    Instead of GDBM_File, use MLDBM, with GDBM underneath.
A reply falls below the community's threshold of quality. You may see it by logging in.