Re: Out of Memory Error -- Possible Leak?
by graff (Chancellor) on Dec 14, 2005 at 04:05 UTC
|
The scenario implies that the HoHoA is already loaded at the point that you go into this three-level loop over the data structure, and you obviously have not run out of memory at that point. So there must be something about how you are trying to access the data structure that is causing perl to consume a lot more memory than was needed to store the original data structure.
You say the structure is "not sparse", but if loading the structure happens to take up, say, 80% of available memory, then you might end up going over the limit if you auto-vivify a relatively small percentage of previously non-existent cells in the overall structure.
I wonder if it would help to try the loop like this:
for (my $i = 0; $i < scalar(@array); $i++) {
my $subhash = $hash{$array[$i]}; # does not autovivify new $hash
+element
next unless $subhash; # equivalent to "exists($hash{$array
+[$i]})"
for (my $j = $i; $j < scalar(@array); $j++) {
my $subarray = $$subhash{$array[$j]}; # same comments a
+s above
next unless $subarray;
foreach my $value ( @$subarray ) {
# do great deeds
}
}
}
If the OP code actually works on a given (smaller) data set, it might turn out that this different form of the nested looping could produce the same output in less time (but I haven't tested that).
The point of this approach is that you can easily check whether a given hash element exists before trying to use its contents as an array ref, and skip it if it does not exist; this involves extra steps within the loops, but these could end up saving a lot of execution time overall. And if they save enough unnecessary memory consumption as well (making the difference between crashing and finishing), speed may be a lesser concern.
update: I think the code as suggested above would be "good enough", if you're confident about how the data structure was being created, but I would be tempted to be more careful:
next unless ( ref( $subhash ) eq 'HASH' );
...
next unless ( ref( $subarray ) eq 'ARRAY' );
...
| [reply] [d/l] [select] |
|
Hello, thanks for the suggestions!
I think I should have been specific -- when I said "not sparse", I actually meant that the data structure had been fully created in advance with a loop exactly like this. I've added in the checks, and before the program the program crashes (about 3 hours of CPU) it does not every autovivify.
One thing that goes on inside the #do great deeds is to change values in the Array part of the HoHoA, and I had expected that to be the cause of my memory leak. I'm really confuzzled as to why this:
foreach my $value ( @$subarray ) {
would eat up memory. I would have guessed it would use one scalar's worth (perhaps 15 bytes of data), and that this memory would get reused over & over again. Instead, the script runs on this loop for a few hours before using up all the memory. | [reply] [d/l] [select] |
|
I'm really confuzzled as to why this:
foreach my $value ( @$subarray ) {
would eat up memory.
That line expands the contents of the array into a list (Ie. On Perls' stack). If the subarray pointed by $subarray is large, it will consume a large amount of memory to do that.
You can avoid that by indexing the array, rather than aliasing it's elements. Eg.
for my $index ( 0 .. $#$subarray ) {
my $value = $subarray->[ $index ];
...
}
Or, if you need to modify the elements, substitue $subarray->[ $index ] wherever you are currently using $value.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
|
|
if ( exists( $data{$array[$i]}{$array[$j]} ) ) {
next();
}
else {
print OUT join("\t", $i, $j), "\n";
}
and... wouldn't ya know it, there *were* elements being autovivified. I've fixed that problem, and now the code appears to be running correctly (appears = its been running for a day without crashing yet -- it still has another couple of days to run, though).
Many thanks -- this has led me to one key problem in the code. Any monks who haven't done so yet, please ++ the parent. | [reply] [d/l] [select] |
Re: Out of Memory Error -- Possible Leak?
by Zaxo (Archbishop) on Dec 13, 2005 at 23:44 UTC
|
How much memory do you have? Your data structures are immense. It looks like you have perhaps 100,000,000 arrays in the hash, and even if most are not there you are autovivifying their keys ( scalar(@array) different anonymous hashes) and a reference to an empty array.
All those structures cost fifty-some bytes each plus a hefty multiple of the number of data elements.
I don't see a leak, just prolific use of memory. Can you read and use less of the data at a time?
| [reply] |
|
The machine itself has 32 GB, but I'm using a 32-bit compiled Perl so presumably it can only address 4 of those.
Also, 100 million arrays is a realistic estimate (its not sparse). I had thought that by using DBM::Deep I would be storing the majority of this data on disk, so that the actual amount stored in memory would be much smaller than 100 million x size(data-structure). Am I misguided on that?
| [reply] |
|
maybe it's because the garbage collector does not have time to run ...
| [reply] |
|
| [reply] |
|
|
Re: Out of Memory Error -- Possible Leak?
by runrig (Abbot) on Dec 13, 2005 at 23:50 UTC
|
You are close to running out of memory before you hit the foreach loop. You should probably start thinking about using some sort of database rather than keeping all that data in memory. | [reply] |
|
Actually this originally was in a database, but the performance.... Benchmarking suggested it would take 5 months to complete, and since this was to be an annual task we're aiming to get it down to two or three weeks. Since DB access was the second biggest cost, it appeared to be an obvious place to tune, so this is an attempt to *replace* the DB look-ups and to a space/time trade-off.
| [reply] |
|
I can't say I've ever even tried to put that much data into memory at once, but I have dealt with some large collections. You might try chunking up the original data and storing it in temporary tables you can iterate through so you have less to work with at one time. If your database provides something like LOAD DATA INFILE, you can make use of it to dramatically reduce transaction times when it's time to put all the data back.
That all said, if it were mine to do the single word that would keep me awake at night is "cluster".
Good luck and have fun with it!
| [reply] |
Re: Out of Memory Error -- Possible Leak?
by choedebeck (Beadle) on Dec 13, 2005 at 23:42 UTC
|
Saying the code crashes on large datasets doesn't tell us much. Without some kind of error message I can only speculate at the problem.
With that said, I don't see an obvious error in your loop. This would lead me to believe that the problem is somewhere in the "# do great deads" section, but without any more information it is hard to say. | [reply] |
|
| [reply] [d/l] |
|
It seems I wrote without fully reading your code examples. I appologize.
| [reply] |
Re: Out of Memory Error -- Possible Leak?
by TomDLux (Vicar) on Dec 14, 2005 at 18:33 UTC
|
If you really need to access 10,000x10,000 arrays, each containing gosh-only-knows-how-many elements, perhaps you want to rethink your algorithmn, and maybe implement it in C, at least the core components.
As much as I love Perl, there are places where it is not the appropriate solution.
--
TTTATCGGTCGTTATATAGATGTTTGCA
| [reply] |
Re: Out of Memory Error -- Possible Leak?
by ok (Beadle) on Dec 14, 2005 at 23:44 UTC
|
When processing ginormous data sets, try to use while instead of for.
for wants to load the entire dataset into memory but while wants to load each datum as you request it. | [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Out of Memory Error -- Possible Leak?
by Moron (Curate) on Dec 15, 2005 at 12:19 UTC
|
Although arguably more perlish than for, foreach has the downside of duplicating the entire set in memory before iterating through it. Converting the foreach to a for ( k=0; ...; ) construction would avoid that.
| [reply] |