Out of Memory Error -- Possible Leak?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Out of Memory Error -- Possible Leak? by graff (Chancellor) on Dec 14, 2005 at 04:05 UTC
The scenario implies that the HoHoA is already loaded at the point that you go into this three-level loop over the data structure, and you obviously have not run out of memory at that point. So there must be something about how you are trying to access the data structure that is causing perl to consume a lot more memory than was needed to store the original data structure. You say the structure is "not sparse", but if loading the structure happens to take up, say, 80% of available memory, then you might end up going over the limit if you auto-vivify a relatively small percentage of previously non-existent cells in the overall structure. I wonder if it would help to try the loop like this: `for (my $i = 0; $i < scalar(@array); $i++) { my $subhash = $hash{$array[$i]}; # does not autovivify new $hash +element next unless $subhash; # equivalent to "exists($hash{$array +[$i]})" for (my $j = $i; $j < scalar(@array); $j++) { my $subarray = $$subhash{$array[$j]}; # same comments a +s above next unless $subarray; foreach my $value ( @$subarray ) { # do great deeds } } }` [download] If the OP code actually works on a given (smaller) data set, it might turn out that this different form of the nested looping could produce the same output in less time (but I haven't tested that). The point of this approach is that you can easily check whether a given hash element exists before trying to use its contents as an array ref, and skip it if it does not exist; this involves extra steps within the loops, but these could end up saving a lot of execution time overall. And if they save enough unnecessary memory consumption as well (making the difference between crashing and finishing), speed may be a lesser concern. update: I think the code as suggested above would be "good enough", if you're confident about how the data structure was being created, but I would be tempted to be more careful: `next unless ( ref( $subhash ) eq 'HASH' ); ... next unless ( ref( $subarray ) eq 'ARRAY' ); ...` [download]	[reply] [d/l] [select]
Re^2: Out of Memory Error -- Possible Leak? by Anonymous Monk on Dec 14, 2005 at 06:50 UTC
Hello, thanks for the suggestions! I think I should have been specific -- when I said "not sparse", I actually meant that the data structure had been fully created in advance with a loop exactly like this. I've added in the checks, and before the program the program crashes (about 3 hours of CPU) it does not every autovivify. One thing that goes on inside the `#do great deeds` is to change values in the Array part of the HoHoA, and I had expected that to be the cause of my memory leak. I'm really confuzzled as to why this: `foreach my $value ( @$subarray ) {` [download] would eat up memory. I would have guessed it would use one scalar's worth (perhaps 15 bytes of data), and that this memory would get reused over & over again. Instead, the script runs on this loop for a few hours before using up all the memory.	[reply] [d/l] [select]
Re^3: Out of Memory Error -- Possible Leak? by BrowserUk (Patriarch) on Dec 14, 2005 at 07:19 UTC
I'm really confuzzled as to why this: `foreach my $value ( @$subarray ) {` [download] would eat up memory. That line expands the contents of the array into a list (Ie. On Perls' stack). If the subarray pointed by `$subarray` is large, it will consume a large amount of memory to do that. You can avoid that by indexing the array, rather than aliasing it's elements. Eg. `for my $index ( 0 .. $#$subarray ) { my $value = $subarray->[ $index ]; ... }` [download] Or, if you need to modify the elements, substitue `$subarray->[ $index ]` wherever you are currently using `$value`. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: Out of Memory Error -- Possible Leak? by graff (Chancellor) on Dec 15, 2005 at 06:06 UTC
Re^5: Out of Memory Error -- Possible Leak? by BrowserUk (Patriarch) on Dec 15, 2005 at 07:02 UTC
Re^4: Out of Memory Error -- Possible Leak? by Anonymous Monk on Dec 16, 2005 at 01:19 UTC
Re^5: Out of Memory Error -- Possible Leak? by BrowserUk (Patriarch) on Dec 16, 2005 at 07:08 UTC
Re^2: Out of Memory Error -- Possible Leak? by Anonymous Monk on Dec 16, 2005 at 01:14 UTC
YES YES YES!!! No matter how sure I was about the data-structure being properly created, I decided to out code like inside my loop: `if ( exists( $data{$array[$i]}{$array[$j]} ) ) { next(); } else { print OUT join("\t", $i, $j), "\n"; }` [download] and... wouldn't ya know it, there were elements being autovivified. I've fixed that problem, and now the code appears to be running correctly (appears = its been running for a day without crashing yet -- it still has another couple of days to run, though). Many thanks -- this has led me to one key problem in the code. Any monks who haven't done so yet, please `++` the parent.	[reply] [d/l] [select]
Re: Out of Memory Error -- Possible Leak? by Zaxo (Archbishop) on Dec 13, 2005 at 23:44 UTC
How much memory do you have? Your data structures are immense. It looks like you have perhaps 100,000,000 arrays in the hash, and even if most are not there you are autovivifying their keys ( scalar(@array) different anonymous hashes) and a reference to an empty array. All those structures cost fifty-some bytes each plus a hefty multiple of the number of data elements. I don't see a leak, just prolific use of memory. Can you read and use less of the data at a time? After Compline, Zaxo	[reply]
Re^2: Out of Memory Error -- Possible Leak? by Anonymous Monk on Dec 13, 2005 at 23:56 UTC
The machine itself has 32 GB, but I'm using a 32-bit compiled Perl so presumably it can only address 4 of those. Also, 100 million arrays is a realistic estimate (its not sparse). I had thought that by using DBM::Deep I would be storing the majority of this data on disk, so that the actual amount stored in memory would be much smaller than 100 million x size(data-structure). Am I misguided on that?	[reply]
Re^2: Out of Memory Error -- Possible Leak? by emilper (Novice) on Dec 14, 2005 at 11:59 UTC
maybe it's because the garbage collector does not have time to run ...	[reply]
Re^3: Out of Memory Error -- Possible Leak? by robin (Chaplain) on Dec 15, 2005 at 12:37 UTC
That isn't possible, because Perl doesn't use a separate garbage collector. It simply reference-counts all its data (scalars, arrays, hashes etc) and frees an item when its reference count drops to zero. (This is why circular data structures cause memory leaks.)	[reply]
Re^4: Out of Memory Error -- Possible Leak? by emilper (Novice) on Dec 15, 2005 at 14:55 UTC
Re^5: Out of Memory Error -- Possible Leak? by robin (Chaplain) on Dec 16, 2005 at 01:31 UTC
Re: Out of Memory Error -- Possible Leak? by runrig (Abbot) on Dec 13, 2005 at 23:50 UTC
You are close to running out of memory before you hit the foreach loop. You should probably start thinking about using some sort of database rather than keeping all that data in memory.	[reply]
Re^2: Out of Memory Error -- Possible Leak? by Anonymous Monk on Dec 14, 2005 at 00:02 UTC
Actually this originally was in a database, but the performance.... Benchmarking suggested it would take 5 months to complete, and since this was to be an annual task we're aiming to get it down to two or three weeks. Since DB access was the second biggest cost, it appeared to be an obvious place to tune, so this is an attempt to replace the DB look-ups and to a space/time trade-off.	[reply]
Re^3: Out of Memory Error -- Possible Leak? by gloryhack (Deacon) on Dec 14, 2005 at 01:20 UTC
I can't say I've ever even tried to put that much data into memory at once, but I have dealt with some large collections. You might try chunking up the original data and storing it in temporary tables you can iterate through so you have less to work with at one time. If your database provides something like LOAD DATA INFILE, you can make use of it to dramatically reduce transaction times when it's time to put all the data back. That all said, if it were mine to do the single word that would keep me awake at night is "cluster". Good luck and have fun with it!	[reply]
Re: Out of Memory Error -- Possible Leak? by choedebeck (Beadle) on Dec 13, 2005 at 23:42 UTC
Saying the code crashes on large datasets doesn't tell us much. Without some kind of error message I can only speculate at the problem. With that said, I don't see an obvious error in your loop. This would lead me to believe that the problem is somewhere in the "# do great deads" section, but without any more information it is hard to say.	[reply]
Re^2: Out of Memory Error -- Possible Leak? by Anonymous Monk on Dec 13, 2005 at 23:54 UTC
I thought I'd rule that out by placing a `next()` prior to that though? Either way the specific error is "Out of Memory!".	[reply] [d/l]
Re^3: Out of Memory Error -- Possible Leak? by choedebeck (Beadle) on Dec 13, 2005 at 23:56 UTC
It seems I wrote without fully reading your code examples. I appologize.	[reply]
Re: Out of Memory Error -- Possible Leak? by TomDLux (Vicar) on Dec 14, 2005 at 18:33 UTC
If you really need to access 10,000x10,000 arrays, each containing gosh-only-knows-how-many elements, perhaps you want to rethink your algorithmn, and maybe implement it in C, at least the core components. As much as I love Perl, there are places where it is not the appropriate solution. -- `TTTATCGGTCGTTATATAGATGTTTGCA`	[reply]
Re: Out of Memory Error -- Possible Leak? by ok (Beadle) on Dec 14, 2005 at 23:44 UTC
When processing ginormous data sets, try to use `while` instead of `for`. `for` wants to load the entire dataset into memory but `while` wants to load each datum as you request it.	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Out of Memory Error -- Possible Leak? by Moron (Curate) on Dec 15, 2005 at 12:19 UTC
Although arguably more perlish than for, foreach has the downside of duplicating the entire set in memory before iterating through it. Converting the foreach to a for ( k=0; ...; ) construction would avoid that. -M Free your mind	[reply]


Perl-Sensitive Sunglasses
	PerlMonks