http://www.perlmonks.org?node_id=867986

kalyanrajsista has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I'm trying to generate a report for all the employees in the company which prepares lots of data in form of hash and push into an array.

Whilst doing so, system is throwing error like 'Out of Memory!!'. I definitely know that there is a problem in storing huge data temporarily in the memory (in form of hashes or array). Is there any way or any Perl module that can be used to store the data temporarily and get it back once all the employees are run successfully

First thing that strikes my mind is Storable module from CPAN

Many Thanks

  • Comment on System out of memory while storing huge amount of data

Replies are listed 'Best First'.
Re: System out of memory while storing huge amount of data
by roboticus (Chancellor) on Oct 28, 2010 at 11:57 UTC

    kalyanrajsista:

    Usually I start using a database once the datasets get large enough. (Normally a large dataset invites reuse anyway, so putting it into a database keeps you from having to re-parse the source data files each time, etc.)

    You may want to look at DBI and DBD::SQLite for a nice little database system that doesn't require a server, is easy to use, etc.

    ...roboticus

Re: System out of memory while storing huge amount of data
by BrowserUk (Patriarch) on Oct 28, 2010 at 13:11 UTC

    Sometimes using a simpler data structure is very effective in conserving memory. For example, creating an array containing 1 million fairly simple hashes:

    perl -e" my @a; push @a, { name=>'fred', surname=>'bloggs', age=>'ancient', dob=>'0/0/0000', empno=>1234567890 } for 1..1e6; sleep 10"

    uses 673MB of ram on my 64-bit system.

    However, storing exactly the same information using strings:

    perl -e" my @a; push @a, join( $;, name=>'fred', surname=>'bloggs', age=>'ancient', dob=>'0/0/0000', empno=>1234567890 ), 1..1e6; sleep 10"

    Only takes 87MB.

    By using strings--which are easily turned back into hashes on a case by case basis as required: my %temp = split $;, $array{ $i ];--rather than hashes, during the data accumulation phase, can often mean that you can store 6 or 7 times as much data in the same memory.

    Whilst there may be some performance penalty incurred as a result of building the strings then converting them back to hashes on demand, this is often far less than the performance hit of moving to disk-based storage.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I ran your test on my 32 bit machine. The AoH took 411 MB, then I tried an AoA and that wasn't much better, 336 MB. I conclude that the hash overhead in terms of storage isn't as much as one might imagine! Then I ran the string version and as you observe 52 MB - way smaller than either of the above.

      I attribute the difference in MB numbers to greater size of pointers on your 64 bit machine, not to any difference in the benchmark.

        I attribute the difference in MB numbers to greater size of pointers on your 64 bit machine,

        Indeed. 64-bit pointers cost heavily.

        Especially as, (today and for the immediate future), on any machine less than something like $250k, the top 24-bits and usually more will be zeros. Even worse when you consider that the bottom 4-bits will also be 0.

        In an XS module I'm writing, I'm experimenting with storing 64-bit pointers in 32-bit fields. By right-shifting 4 places, I get 32-bit values than can cover 64GB, which is as much memory as I'm going to have in the next say 10 years or so.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: System out of memory while storing huge amount of data
by hbm (Hermit) on Oct 28, 2010 at 12:52 UTC

    I just faced the same problem. With a few lines, I moved my ~4GB hash from RAM to disc:

    use DB_File; my $db = '/path/to/hash.db'; my %hash; tie (%hash, "DB_File", $db, $DB_BTREE) or die("Unable to tie $db: $!") +; # do things with hash untie %hash;

    Note that $DB_BTREE format is a lot faster than the default, $DB_HASH. (Twice as fast, in my benchmarks.) See also 152966, and the thread leading up to it.

Re: System out of memory while storing huge amount of data
by Marshall (Canon) on Oct 28, 2010 at 13:20 UTC
    I like the other ideas presented. I haven't used SQLite, but have used MySQL and the Perl DBI interface just rocks! Tieing hash or array to file also works, but once I get to that point, I'm already thinking real DB.

    If I understand your post right, there is some huge hash, then some result set that is generated and being pushed onto an array. It could be that if you wrote the result set to disk instead of pushing onto array, then things might still work out ok for you. undef the hash, which allows Perl to reuse that memory and then do what you have to do with this file of the result set array.

    Update: after running BrowserUk's benchmark with 1 million simple records which didn't get anywhere close to taxing the limits of my 32 bit Windows XP box, I'm wondering if there might be some kind of obscure reason why your system reports "out of memory". I vaguely remember something in the past where I had to increase the size of my paging file - I think some apps like Firefox, etc. can be pretty memory intensive and I suppose that more disk space was needed so that these programs which were not being used could be paged out so that my Perl program could use that memory. Anyway just another thought.