http://www.perlmonks.org?node_id=266839

RobCheung has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, Does Perl have garbage collection mechanism and how it performs?
#----------------------------------------------------------- use 5.008; use strict; use threads qw(yield); use threads::shared (); use Time::HiRes qw(sleep); our @GlobalArray : shared = (1..1000); my $sub1 = sub { while (1) { my $a; { lock @GlobalArray; $a = shift @GlobalArray; } print $a, "\n"; yield; sleep 0.05; } }; my $t1 = new threads $sub1; my $t2 = new threads $sub1; my $t3 = new threads $sub1; $t1->detach; $t2->detach; $t3->detach; while (1) { # readFile(); # ... push @GlobalArray, (1..1000) unless scalar @GlobalArray; sleep 0.001; } 1; #-----------------------------------------------------------
This is a simple example, but it has a serious problem (on Windows 2000) that the used memory will increase by 200kb after every "push", and it will be out of memory soon. I wonder that how the GC mechanism performs? Is there any way to fix the problem by collect some leaked memory during the runing time? Thanks advance for any help!

Replies are listed 'Best First'.
Re: Does Perl have garbage collection mechanism and how it performs?
by broquaint (Abbot) on Jun 18, 2003 at 14:42 UTC
    Perl's garbage collection is implemented through reference counting (everything has a reference count, when the reference count drops to 0 the 'object' is removed). This can be demonstrated with a simple lexical variable and Devel::Peek
    use Devel::Peek; my $foo; Dump($foo); { my $r = \$foo; Dump($foo); } Dump($foo); __output__ SV = NULL(0x0) at 0x8107e34 REFCNT = 1 FLAGS = (PADBUSY,PADMY) SV = NULL(0x0) at 0x8107e34 REFCNT = 2 FLAGS = (PADBUSY,PADMY) SV = NULL(0x0) at 0x8107e34 REFCNT = 1 FLAGS = (PADBUSY,PADMY)
    There we can see $foo has an initial REFCNT (reference count) of 1 created by the file-level lexical scope, it is incremented when $r created a reference to it, and then decremented when $r goes out of scope (garbage collected because it's enclosing scope exitted and it's REFCNT dropped to 0) and would be garbage-collected once the file-scoped exits (which in this case is when the program ends). See. Matts wonderful Proxy Objects article for a more thorough review of reference counting in perl.
    HTH

    _________
    broquaint

Re: Does Perl have garbage collection mechanism and how it performs?
by BrowserUk (Patriarch) on Jun 18, 2003 at 15:59 UTC

    Unfortunately, the current (as of 5.8) state of the ithreads implementation in perl still has some bugs, and the garbage collection of scalars shared implicitely through a shared array seems to be one of them. I don't know the ins and outs of it, nor if or when it is likely to be cured. From what I've read elsewhere, the bug seems to be a "perl thing", rather than relating to any particular platforms.

    You can, at least in my very crude imperical tests, slow down the rate of growth by using Thread::Queue as the mechanism for distributing your data to your threads, but it doesn't fix it completely.

    The other possibility, and it's not a good one, is to not detach your threads, but rather to occasionally join them and spawn a new one to replace it. The leaked memory seems to be released back to the OS once the thread is joined. I can't offer you a good designed for this as I haven't yet worked one out for myself. This is a very unsatisfactory situation, and I wish it were not so.

    I also wish that I had the ability to assist in fixing the problem but even if I understood enough to construct a patch, I see no way for me to participate in the process.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      This is patched in bleadperl, read more here and in here.

      As an aside, you could try using forks instead of threads but maybe you wouldn't on windows :-)

      -- #!/usr/bin/perl -np BEGIN{@ARGV=$0}s(^([^=].*)|=)()s; =Just another perl hacker\

        Thanks for the update Joost. I had seen the start of that discussion before, but not to the point where a patch had been forthcoming.

        Unfortunately, the patch to shared.xs does not appear to have made it onto CPAN yet. Will it ever? Or will it be held back until 5.9 is released?

        That makes it very difficult for those using pre-compiled builds of perl to utilise the patch. Whilst there are several repositories that will undertake building a CPAN module to a binary and making it available via PPM, it's lees likely that the same people will undertake applying patches to core modules and building them for public accessibility with all the inherent risks that entails.

        (Are you (even vaguely) interested PodMaster? :) (Pretty please:)(If I supply the tarball with the patch applied?) (Is there a mechanism for determining all the latest patches that would need to be applied to a given version of a cpan module in order to bring it upto bleedperl status?)

        I've seen the forks module, but it strikes me that as forks are emulated using threads under Win32, using forks to emulated threads using emulated forks is a little....erm....artificial:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        This isn't directly aimed at you Joost, but it seemed the most appropriate place to mention that unfortunately, the patch for the shared memory leak doesn't appear to work, leastwise not when applied to either AS 5.8 or to 5.8.0-RC3 freshly downloaded and built for Win32 using BCC.

        There is also new discussion in the thread you linked at RT.perl.org to the same effect.

        My best guess is that bleadperl (whatever and whereever that is?) also contains other patches that aren't a part of the latest version available via ftp/CPAN that are also required. Unfortunately, I don't understand the ticketing system. I tried to access the patch via the #19200 number listed on it and got a permission error and ended up applying the patch through cut&paste from the thread.

        (As an aside, has any other Opera user with a perl.org/rt.perl.org login succeeded in logging in there using Opera? I can get in using IE5.5, but don't like using that other than for compatibility checks.)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Does Perl have garbage collection mechanism and how it performs?
by hardburn (Abbot) on Jun 18, 2003 at 14:44 UTC

    Yes, perl does do GC. Variables are deallocated when all referances to them have been eliminated (beware of circular referances . . . ). However, it does not free memory to the OS until perl exits. Instead, it puts it in a pool where perl can grab more memory for future variables instead of asking the OS for some more bytes. So your program will keep that 200k allocated for perl until it exits. Fortunatly, when perl isn't using that memory, the OS will swap those pages to the hard drive (at least, in any halfway decent VM implementation). So this isn't as big a problem as it first appears.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

Re: Does Perl have garbage collection mechanism and how it performs?
by meredith (Friar) on Jun 18, 2003 at 16:07 UTC
    Looks look like a huge battle! Hmm, if you put (1..1000) in at the start, then three threads start eating them, then you start adding more (1..1000) here's what I'm thinking: Perl never deallocates memory, so it's not recycling memory in the way you want, but you /are/ adding a list of 1000 integers onto another list (very fast too, I might add), so I'm really wondering what are you expecting? A fixed-field array of 16-bit integers stored linearly in memory will take ~16KB on it's own, and in the case of perl, that will include the overhead of a perl primitive for each element, plus the indexing of the array. I think you've chosen a method to beat the hell out of yourself; is there any other way you might get this done?

    Are your processing rates really at this ratio? Or is it simply an example?

    Update: Thread::Queue has been mentioned before, but how about this: you make a queue for each sub, and pass work to it that way. Just have the 'master'/'manager' round-robin the work, then maybe your queues (read: memory usage) will be smaller.

    mhoward - at - hattmoward.org
Re: Does Perl have garbage collection mechanism and how it performs?
by chromatic (Archbishop) on Jun 18, 2003 at 16:31 UTC

    Does it use less memory if you simply assign the list to @GlobalArray? It seems like that would be easier for the optimizer to handle.

Re: Does Perl have garbage collection mechanism and how it performs?
by RobCheung (Acolyte) on Jun 19, 2003 at 21:06 UTC
    Hi joost,
    I re-compile the shared module and run my script again, but the memory leak problem goes on...
    I also tried to compile the forks module, but failed when "nmake test" ;-(

    To BrowserUk:
    The Thread::Queue module is really better, however, it cannot solve the memory leak problem completely.

    Now, I will explain to you what i want to do through the my script .

    The main thread reads a file list one file after another, and put into the queue. (The file list contains many text files that vary from some kb to some Mb. The total size is about hundreds of Mb. ) And then let the other three or more threads parse this queue, if the queue is empy, then the main thread read the next text file.

    I cannot run only this program once on my box, so I have to solve this problem!
    Maybe i should try fork, but it might be troublesome. I prefer to multi-thread, which looks much more simple.


    Thanx for all the replys.