|Problems? Is your data what you think it is?|
No garbage collection for my-variablesby betterworld (Deacon)
|on Sep 15, 2008 at 19:48 UTC||Need Help??|
Apparently, the memory that is (directly) occupied by my-variables is never automatically freed. The term "directly" includes the case where a my-variable holds a (very long) string.
As ikegami has demonstrated, a string buffer will grow when needed, but it will never (automatically) shrink or disappear. Apparently this is part of some kind of optimization to avoid constantly reallocating buffers for code that is used often.
I've shown that a buffer will only be reused for the same variable. This means that it will not be reused for a variable in another subroutine. This is were I see a big problem. In large programs there are thousands of lexical variables, and not all of them are used more than a few times, but all of them retain their buffers. Even for a buffer that it reused often, this is not optimal: Once it will have a large chunk of data, it will stay at that size, regardless how small the data is in the next calls.
The commonly suggested workarounds are:
Alright, but the problem is that code is generally not designed like this. Of course, you can design your code this way if you plan to handle large data. However, almost all serious projects use external code that they haven't written themselves. In my search for the most obvious example, I found Encode.pm:
Consider this code:
If you watch this program's memory consumption, you'll find that it will use approximately 288MB after "initialized" has been printed. After "cleaning" has been printed, the amount will shrink considerably to 98MB. (Actually it will shrink even more if you wipe out the "init" subroutine itself, I guess this is because of the large string constant.)
Responsible is this code in Encode.pm:
Both $string and $octets hold our dear string, and (like I) the author obviously thought that they don't need to free its memory.
I've named the subroutine "init" to suggest that this is code that will only be used at the very start of a long program lifetime, which means that the long string buffer will linger around needlessly.
So, what would I be supposed to do? Don't use Encode and do my character transcoding myself? Or should I actually use "undef" to clean all the subs that I have used? Consider that my initialization code loads an XML configuration file. I'd have to clean most of the namespaces of XML::Simple, XML::Parser and whatelse. And if I actually plan to continue using these modules, I'd have to wipe out "%INC", then require them again, not very nice. (Just an example; I've not really checked these modules, so please don't be offended if you are the author and have considerately undef-ed every variable.)
I've not looked for this optimization in the perl source yet, but I'd really like someone to explain why it is needed. I can agree that it would not be performant to do a lot of malloc/mmap/munmap/brk for every string that is copied, however IMHO there are situations where perl should find some way to realize that any of the following cannot be performant either:
I'll conclude with an example snippet that demonstrates how you can eventually get your computer to use excessive amounts of memory or even swap:
Because of the "sleep", you can run this snippet and watch it indulge itself by eating one megabyte per second (in GNU, use "top" and press M).
The snippet uses string-eval to generate a lot of subroutines like this:
then calls them one after another.
Well, that was a large chunk of text now, I've tried to ease your reading by using bold text, I hope that perlmonks' buffers will eventually be freed from this text, and I hope that I haven't missed something obvious.