http://www.perlmonks.org?node_id=844413

ragowthaman has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am trying to load millions of records into a Hash. I tried to lean it up, still its couple Gigs of data. So, wondering, how do i manage it. Someone suggested tie hash. But, wondering, how can I use that.......ANY tutorials or pointers to that? Or any other suggestions....will be very much appreciated. Gowthaman

Replies are listed 'Best First'.
Re: how to handle large hashes
by BrowserUk (Patriarch) on Jun 13, 2010 at 09:06 UTC

    If using Perl's hashes means you are running out of memory (or into swapping) then using a file based hash (eg. BerkeleyDB ) will get you around that problem, but will be slow. Slower to access (compared to memory-based hashes); much slower to construct.

    Depending what you are doing with the hash, there are sometimes alternatives that provide some functionality similar to a hash that are more compact that can avoid moving to disk-based. But you'd need to describe the data, and the use you are making of the hash.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: how to handle large hashes
by davido (Cardinal) on Jun 13, 2010 at 09:34 UTC

    If you're finding that you have a "doesn't scale well" problem (ie, your hash or array are simply getting too large, and possibly could even get larger), one option is to move to a database solution. You're already pondering that option when you talk about 'tie hash'; that's probably going to lead you to a hash tied to a database. But you may just find that rethinking your needs and designing a solution around a database while forgetting about the hash altogether will give you a better approach.

    Of course this is speculation; we don't know what you're really trying to do, and thus can only take a blind stab at how to help. But a lightweight database such as SQLite can really help when you find that it's just not practical to slurp a big hash into memory.


    Dave

Re: how to handle large hashes
by Khen1950fx (Canon) on Jun 13, 2010 at 10:07 UTC
Re: how to handle large hashes
by Xilman (Hermit) on Jun 13, 2010 at 10:16 UTC

    Something else to think about: do you need to store all your data at once?

    If your script is going to be called more than once and each invocation only needs to access part of the data, perhaps you can filter the data somehow before loading it into a hash for further processing.

    Without knowing more about the detail of your application it's hard to tell whether this approach may be usable.

    Paul

Re: how to handle large hashes
by tokpela (Chaplain) on Jun 13, 2010 at 16:41 UTC

    Have you looked into DBM::Deep?

    It can work like a hash and supports files up to 4GB and possibly larger depending on your system.

Re: how to handle large hashes
by Tux (Canon) on Aug 11, 2010 at 06:56 UTC

    Assuming you have enough disk space, you can use DB_File or Tie::Hash::DBD:

    use DB_File; tie my %hash, "DB_File", "file.db", O_RDWR|O_CREAT, 0666; # now use hash as normal
    use Tie::Hash::DBD; tie my %hash, "Tie::Hash::DBD", "dbi:SQLite:dbname=db.tie"; # now use hash as normal

    Read the documentation of your prefered module to see what the options and possibilities, like data persistence and nested structures, are.


    Enjoy, Have FUN! H.Merijn