Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Loading large amount of data in a hash

by shaezi (Acolyte)
on May 01, 2002 at 21:53 UTC ( #163405=perlquestion: print w/replies, xml ) Need Help??
shaezi has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am attempting to work with a hash, but the amount of data that I'm loading into it is causing an out of memory error. The data is over 800mb.
I tried using tie, like:
use GDBM_File; my %data_parsed; tie %data_parsed, "GDBM_File", "$hashfil", O_RDWR|O_CREAT, 0666;
First of all I want to know if this is a good approach or not. Second, for the values of each key I'm referencing an array. When I try to get a value for example: $data_parsed{$hashkey}[$i] I get an error message like:Can't use string ("ARRAY(0x200e1dfc)") as an ARRAY ref while "strict refs" in use. So I was wondering what I was doing wrong. Any help would be appreciated. Thanks!

Replies are listed 'Best First'.
Re: Loading large amount of data in a hash
by maverick (Curate) on May 01, 2002 at 22:22 UTC
    Right idea. You just have a few little details. You can't natively store references in dbms. You'll have to serialize the arrayrefs down to scalars before you can store them, using something like Storable. Which leads to the second issue. If memory serves, GDBM has a fixed size for those scalars...Berkely BTrees do not. Try something like:
    use DB_File; use Storable; tie %data_parsed, "DB_File", "$hashfil", O_RDWR|O_CREAT, 0666; # inside read data loop $data_parsed{$key} = freeze($array_ref); # inside use data loop $array_ref = thaw($data_parsed{$key});
    DB_File is slow to create, but fast to read. That may or may not be an issue.



    Ya know...after taking another glance at this, 800 is a lot of data. Plus, you have a key, and array combination. Perhaps it's time to move up to a full blown database like MySQL or Postgres?

    OmG! They killed tilly! You *bleep*!!

Re: Loading large amount of data in a hash
by strat (Canon) on May 01, 2002 at 23:16 UTC

    with big data in memory (e.g. hashes of hashes of lists), I prefer using perl 5.005_03 to perl 5.6 or 5.61, because it takes about 20% less memory and sometimes saves quite a lot of runtime.

    In a case I'm just working with, the statistics are about like the following:
    5.005_03~ 25 minutes480 MB
    5.6.1~80 minutes~600 MB
    I've tested this behaviour under Solaris and Linux and Win2k.

    Best regards,
    perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"

Re: Loading large amount of data in a hash
by perrin (Chancellor) on May 01, 2002 at 23:00 UTC
    Instead of GDBM_File, use MLDBM, with GDBM underneath.
Re: Loading large amount of data in a hash
by BUU (Prior) on May 02, 2002 at 01:19 UTC
    ditto the above poster, is there something wrong with using mysql or postgresql? Because i really dont see why you would need the minor (considering 800mb) speed increase you would get by having it all in memory. It might even faster due to less swappage.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://163405]
Approved by mandog
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2018-05-20 18:18 GMT
Find Nodes?
    Voting Booth?