|Perl: the Markov chain saw|
Multithreading leading to Out of Memory errorby joemaniaci (Sexton)
|on Jun 07, 2013 at 15:58 UTC||Need Help??|
joemaniaci has asked for the
wisdom of the Perl Monks concerning the following question:
So I have a new multithreading implementation that should be smooth running and for the most part it is. It wasn't until I tried evaluating 231 Gb of files that I started getting "Out of Memory!" errors and having the program die. I have been over everything three times now and still cannot figure it out. So here is what I have...
Now in my googling I have come across several references talking about perl threads maybe holding on to excess data over time. As you can tell, the only real global piece of data I am using is the queue. Therefore, all the real data is being built up in the individual parse subroutines, meaning the perl garbage cleanup should be taking care of that once the subroutine returns. Unless it's bugged or something. The only thing I can think of is destroying and recreating my threads every 200-300 file iterations.
EDIT: I don't know if it is a bug in Perl or what, but essentially the purpose of the array was to store certain data, and then get quantities of repeated elements. I changed the array to a hash and changed...
Then I got the quanties I wanted later on down the road...
Unless perl has some maximum array limit, and unless there was some sort of overflow issue, I have no idea why the original implementation had such a bad memory leak. Either way the memory leak is gone.
So I went back out of curiosity and looked at that array, essentially commented out the new code and uncommented the old implementation and looked at the sizes of the arrays. At one point the array held over 500,000 items, so I don't know if that is a lot or a little. Either way, some googling has led me to the fact that perl doesn't care so long as the system has the memory to store it. Now my machine has 16 gigs and never even came close to being fully utilized. I am assuming the operating system placed some limit on the perl.exe itself. Either way it doesn't matter because it wouldn't be an issue if the array was properly reclaimed. I did it by the book...
...which had zero effect. So my theory stands that there may be an overflow somehow and so when the functions return and garbage collection is performed, it is missing the memory that was overflowed. FYI, I have perl v5.12.3 compiled for multi-thread. I am going to try to test my theory and I guess depending on the results, it may get a new question.
So after doing some research into perl memory management I see what's going on. So...
...doesn't do what I thought it does. It simple clears out all the C pointers and structure, but not the memory. Perl by default holds onto that memory, the idea being that the developers of perl had speed as a greater priority than memory utilization. So it holds onto that memory for future use, in hopes that it'll save time to reuse it, instead of having to request memory from the system. You can compile perl to use your operating systems malloc() implementation, but then you lose your ability to move your application across systems.
I think in single threads, I was fine because each consecutive file reused the memory allocated the first time around. Once I multithreaded it, issues arose because I might have had multiple threads parsing the file in question, at which point I needed memory allocated for 2,3,4 or more files, or more precisely, the array in question.