pkirsch has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I know this theme 50069 was discussed in 2001, but
does anybody has deep information on how to influence the garbage collector, to free memory (in the sense of give it back to the OS)?
Let me show:
{
my $foo = 'X' x 100000000;
getc;
}
undef $foo;
my $foo = "";
getc;
At the first prompt (via ps auxwf output):
user 12310 2.1 4.8 211000 197024 pts/2
At the second prompt:
user 12310 1.2 4.8 211000 197028 pts/2
Now, what I wanted to show: - the memory allocated for $foo is not released.
What I read on several mailing-lists is that the optimizer does not return the memory to the OS, in the hope that the same variable will be used again.
But what I'm missing is some kind of a 'free' statement in Perl, which definitely returns memory (and I not know yet, undef seems not to help).
Let me show another example:
$foo = 'X' x 100000000;
getc;
undef $foo;
getc;
$foo2 = 'X' x 100000000;
getc;
gives (again output from 'ps auxwf'):
user 15185 18.0 4.8 211000 197020 pts/5
user 15185 5.0 2.4 113340 99364 pts/5
user 15185 4.3 7.2 308660 294684 pts/5
As you see some RAM is given back to the OS (~50%, (99364/197020)) .
BUT: I would expect that the Perl virtual machine (memory management)
reuses the allocated RAM from $foo for $foo2.
As you see $foo is not used anymore in the program context, but the
memory management still holds some RAM for it present (I mean why ?).
(I'm using perl 5.10.0 on OpenSUSE 11.0)
Re: Perl Garbage Collection, again
by BrowserUk (Patriarch) on Dec 23, 2008 at 10:46 UTC
|
It is possible to demonstrate (as you have), that if you request a single large chunk of ram from the C-runtime allocator (malloc() etc), that it will make a separate virtual memory allocation (VirtualAlloc() or platform equivalent) specifically for that request, rather than use or extend the existing heapspace.
And when the CRT free() is called on that allocation, it will in turn call the OS VirtualFree() (or equivalent), which will return the virtual memory pages backing the allocation, back to the OS.
As you see some RAM is given back to the OS (~50%, (99364/197020))
If you consider those numbers carefully, you have 197020*1024 = 201748480 bytes of virtual memory before the undef and 99364*1024 = 101748736 afterward. Meaning that you freed 99,999,744 bytes of space. Which corresponds (give or take the rounding up to 1024 byte pages), the 100,000,000 byte single, extraordinary allocation you made.
Whilst you can make use of this knowledge to cause single huge chunks of ram, (the break point seems to be ~1/2MB or greater using MS CRT (via AS Perl) on my system), it isn't as useful as you might think.
The majority of large volumes of data manipulated in Perl are allocated in arrays or hashes. And whilst these both have a single, largish, contiguous compenent at their core, (the array of pointers), the vast majority of the space they occupy when populated, is allocated in lots of smallish allocations (the SVs Hes and HEKs). These will always be allocated from separate virtual allocations (heaps) and intermingled with other SVs etc. And those virtual allocations will never be released back to the OS until every allocation within them has been freed. Which is unlikely to ever happen.
So, if you're manipulating large chunks of contiguous data, basically big scalars, within your program, and you can cause these to be allocated in one go (as with your 'X' x 1e8; there are other ways also), then you can cause them to be freed back to the OS (rather than just the process memory pool) when you are done with them.
But, the conditions under which that free back to the OS happens are unwritten, vague, and probably vary widely with platform, compiler, CRT version, and maybe even compiler and linker options. It's not something that you can easily codify enough to rely upon.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
Helpful, indeed.
What I recognized:
- allocating a big chunk of memory at once is evil (for releasing it afterwards):
use strict;
$| = 1;
print "$$\n"; #top -p $$
print "Test, Allocating a large string \n"; <>;
{
my $foo = 'X' x 100000000;
print "Large String allocated.\n";<>;
undef $foo;
print "Large String deallocated.\n";<>;
}
print "2nd Large String.\n";<>;
{
#evil: my $foo2 = 'X' x 100000000;
my $foo2;
$foo2 .= 'x' x 1000 for (1 .. 100000);
print "2nd Large String allocated.\n";<>;
undef $foo2;
print "2nd Large String deallocated.\n";<>;
}
print "Now what? Press enter to exit";
<>;
As you can see the memory used for $foo2 is returned to the OS. So my initial example was not correct.
May I ask you another question, which puzzles me:
The above example also shows that at the end of the of script there are still 97m allocated (due to $foo). Also, the variable $foo2 does not use the previous allocated chunks of $foo (because the usage also rises up to total usage 192m.
Is the caused by the overhead of the internal memory handling of perl?
Thanks, | [reply] [d/l] |
|
use strict;
$| = 1;
print "$$\n"; #top -p $$
print "Test, Allocating a large string \n"; <>;
{
my $foo = 'X';
$foo x= 100000000;
print "Large String allocated.\n";<>;
undef $foo;
print "Large String deallocated.\n";<>;
}
print "2nd Large String.\n";<>;
{
#evil: my $foo2 = 'X' x 100000000;
my $foo2;
$foo2 .= 'x' x 100_000 for (1 .. 1000);
print "2nd Large String allocated.\n";<>;
undef $foo2;
print "2nd Large String deallocated.\n";<>;
}
print "Now what? Press enter to exit";
<>;
When that reaches the "Now what" prompt, you should find that the memory usage has return to the same level as you had at statrtup with all the large allocations now returned to the OS.
The main change is my $foo = 'X'; $foo x= 100_000_000;, rather than my $foo = 'X' x 100_000_000;.
With the latter version, the big string is constructed on the stack, and then assigned to the scalar $foo, with the result that it makes two large memory allocations, one of which never gets cleaned up.
With the former version, that double allocation is avoided and the memory is cleaned up properly.
Note. The minor change to the second loop 1_000x100_000 rather than 100_000x1_000 makes no difference to the outcome, I just got bored waiting for the loop to run :)
The duplication of the allocation using the second method isn't an error, but rather a side effect of the way the code is parsed and executed. The fact that it doesn't get cleaned up properly could be construed as a bug--or not. You'd have to raise the issue with p5p and take their view on the matter.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
Re: Perl Garbage Collection, again
by ikegami (Patriarch) on Dec 23, 2008 at 10:22 UTC
|
FYI, the reason nothing is released in the first snippet is:
{
my $foo = 'X' x 100000000; <-- a lexical var $foo
getc;
}
undef $foo; <-- the package var $foo was
my $foo = ""; already undef so does nothing
getc;
You would have wanted
{
my $foo = 'X' x 100000000;
getc;
undef $foo;
}
getc;
| [reply] [d/l] [select] |
Re: Perl Garbage Collection, again
by Anonymous Monk on Dec 23, 2008 at 09:34 UTC
|
| [reply] |
|
| [reply] |
Re: Perl Garbage Collection, again
by pkirsch (Novice) on Feb 24, 2009 at 11:00 UTC
|
After some play with Perl (5.8.8 and 5.10.0) I finally have some results:
diff -urN perl-5.8.8/scope.c perl-5.8.8_/scope.c
--- perl-5.8.8/scope.c 2005-09-30 15:56:51.000000000 +0200
+++ perl-5.8.8_/scope.c 2009-02-23 16:23:40.000000000 +0100
@@ -946,14 +946,26 @@
SvREFCNT_dec(AvARYLEN(sv));
AvARYLEN(sv) = 0;
}
+ av_undef((AV*)sv);
break;
case SVt_PVHV:
hv_clear((HV*)sv);
+ hv_undef((HV*)sv);
break;
case SVt_PVCV:
Perl_croak(aTHX_ "panic: leave_scope pad code");
default:
SvOK_off(sv);
+ const U32 padflags
+ = SvFLAGS(sv) & (SVs_PADBUSY|SVs_PADMY|SVs_PADTMP);
+ switch (SvTYPE(sv)) { /* Console ourselves with a ne
+w value */
+ case SVt_PVAV: *(SV**)ptr = (SV*)newAV(); break;
+ case SVt_PVHV: *(SV**)ptr = (SV*)newHV(); break;
+ default: *(SV**)ptr = NEWSV(0,0); break;
+ }
+ SvREFCNT_dec(sv); /* Cast current value to the w
+inds. */
+ SvFLAGS(*(SV**)ptr) |= padflags; /* preserve pad natur
+e */
+
break;
}
}
So yes, I have patched the file scope.c. Because a lot of the (existing) code does not use a explicit undef on a scope exit (of course, does this imply a variable declared with 'my').
At least it helps me to free some malloced space on a scope close (see below).
print "Start: >".$$."<\n";<>;
{
my @arr;
$arr[$_]=$_ foreach(1 .. 1000000);
print "How Many?\n";<>;
#->not needed: undef @arr;
}
print "Done, how many?\n";<>;
| [reply] [d/l] [select] |
|
|