Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Postpone garbage collection during a child process run?

by flexvault (Parson)
on Oct 06, 2010 at 14:27 UTC ( #863796=perlquestion: print w/ replies, xml ) Need Help??
flexvault has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Is there a way to prevent garbage collection in a process until the process finishes it's current work. I can block signals at the start of the process, and then enable them at completion of the cycle, and then wait for more work. Is there anything similar for garbage collection?

Most of our processes work for .11 to .25 seconds, so they are persistent but not working for a long time. Enabling the garbage collection just before the 'accept' and disabling while working would prevent losing information during the process.

Background: Originally I did something like this:

my %Account = (); my $accptr = \%Account; my $work = $input ## Whatever needed to be done my $ret = &GetAccount($accptr); my $html = &CallChildToWork($accptr,\$work);

Obviously, a lot of the work is missing and subroutine 'CallChildToWork' will call other subroutines, and that's where we got problems. It worked 100% in testing. But in production, after some random time '$accptr' would be lost in a subroutine and then return in the caller routine. We verified this with 'syslog' by logging '$accptr' on subroutine entrance and again on return. It would be there on entrance and gone on return. This never cause a failure, just missing results. Here, just a guess, that it had to do with scope and garbage collection.

We solved this by making '%Account' global, and subroutines copy between '%Account' and '%TAccount'. This works 100% in production, but requires locking and unlocking the copy process. Since a second is finite, we are limiting the number of processes and cores that can be used in production.

Thank you

"Well done is better than well said." - Benjamin Franklin


Update: Resolved!

Upon testing with Perl 5.12.2, this problem no longer exists!

Comment on Postpone garbage collection during a child process run?
Download Code
Re: Postpone garbage collection during a child process run?
by Corion (Pope) on Oct 06, 2010 at 14:37 UTC

    You don't show much relevant code, and you don't tell us much about the environment you're working with, so I'm going to make some assumptions:

    • You're using threads
    • You're passing around references and these references point to things that already are visible to more than one thread, and potentially shared.

    This is problematic as the threads will see the references pointing to the same thing and then stomp on each others values.

    You could have shown the relevant code of the "subroutine that makes $accptr vanish", but you chose not to. Perl has reference counting and no garbage collection, so things don't simply vanish. You have a logic error somewhere.

    As a remedy, I would look into using a Thread::Queue to hand off jobs to worker threads. This will force you to avoid global variables for passing information and it will properly clone information across threads so they don't step on each others toes.

      I agree 100% with this assessment:   you have a nasty logic-error in your code.

      In addition to "Thread::Queue," you should also review what is contained in the Perldoc entry for threads::shared ... particularly the BUGS AND LIMITATIONS and the SEE ALSO sections.

      Sharing of information among multiple threads does not work reliably, in the manner that you are now attempting to do so, even if it seems to do so in testing.   (Hate to be the bearer of bad news, but then again, thread-related issues are often like that.)   Your notion of “delaying garbage collection” will lead to even more problems; it is a blind alley.

        (Hate to be the bearer of bad news,

        Then just stop!

        particularly the BUGS AND LIMITATIONS and the SEE ALSO sections.
        • Bugs and limitations amount to:
          • share( <hashref>!<arrayref>) unituatively empties the structure being shared.

            Now fixed by the provision of shared_clone().

          • splice doesn't operate on shared arrays.

            Big deal, not!

          • Taking a ref to a shared structure elements doesn't autovivify.

            Many people wish that Perl didn't autovivify so readily anyway. Again No big deal.

          • refaddr() reports the thread local alias address, not the that of the principle.

            Use IsShared() to obtain a cross-thread usable identifier.

          • each can produce incorrect results when run against nested hashrefs.

            Make a local copy the the hashref and run each on that. Simple.

        • See also contains nothing but a list of useful links.

          So why make it sound like it contains a government health warning?

        A few limitations, with solutions. No big deal.

        And all aimed at someone who didn't mention threads and isn't using them. Stop jumping on bandwagons that you don't understand!

        Have you actually written any threaded perl code?

        Because your pronouncements on the subject show little or no understanding of the real issues involved.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      . . . You're using threads

      I use 'fork' to generate the children. I can't show the failing subroutine, since all subroutines have failed at one time or other. This is a 20,000+ line sub-system with 91 subroutines. Some times, a process works for 1-hour and then loses the pointer, and some times it's almost immediate. From analysis of the logs, it usually happens in the 4th or 5th inner subroutine call. The application starts with 4 children per core, and expands if xx% are working, and contracts if more than minimum exist and work is below yy%. All children live for approximately 4 hours or end early if their RSS is more than 2 * original RSS.

      Environment:

      Testing: AIX/Unix
      Production: Suse Linux or AIX

      Thank you

      "Well done is better than well said." - Benjamin Franklin

        Bugs like this one ... ahem ... suck large.   But, “bugs they are,” and your application as-written will never work reliably in production.

        Maddeningly, it will be “tantalizingly close.”   You’ll spend a lot of time trying to convince yourself that the existing design can be made to work.   But these flaws are structural.   “Patches and workarounds,” like disabling signals and whatever else you might now be doing, only (I am very sorry to pronounce...) prolong the inevitable.

        “Because I have been there too, my son... And a more unpleasant place to be, might never be found on this world.”

        Perhaps you can isolate the necessary disciplines into a few base-classes.   Or, maybe you can just change the routine that hands-off the work to the child threads.   (Ideally, the child would simply pull data off of one queue and put it back onto another.)   So, the code changes might not be as dramatic as you may fear.   But, until the changes are made, the code will continue to throw sporadic errors under production loads (and maybe nowhere else).

        You definitely want to implement the solution so that it is “nearly invisible,” being encapsulated into a class such that every other part of the application can “just ignore it... It Just Works™.”   And I’d say that you can do that fairly easily.   Once it is done, the app will be rock-steady under any production load.

      Let me add some information to help the PerlMonks help me.

      Why do I think it has something to do with garbage collection?

      Since the use of the hash pointer did not cause a 'die', it must be still pointing at valid allocated memory. And when looking at the syslog entries, the pointer address at start of the subroutine and at the end of the routine is the same, as well as where it's pointing. However, after the return from the subroutine, the syslog entries show that the pointer address is the same, but it's contents (where it is pointing) is different, and that address points to the valid hash but without the changes made during the call to the subroutine. This looks like the hash was copied to a new location, and the old hash memory was placed on a free list, and not returned to the operating system.

      I can only guess about this!

      It seems, I was able to use the pointer before the copy completed, or some other type of race condition existed between garbage collection and my program. If garbage collection locked the hash, then the pointer wasn't updated until after I used it or when the subroutine returned ( since it looked like the subroutine was never called ).

      If I am using the term 'garbage collection' incorrectly and a different term defines what I observed, please enlighten me.

      Thank you

      "Well done is better than well said." - Benjamin Franklin

        What you describe is not what happens. You have a subroutine which triggers some action at a distance, namely, cleaning out a hash. You also seem to be quite confused when talking about references and what they reference, as you name them "pointers" and talk about "addresses". If you mean by "address" what you get when you print a reference (HASH(DEADBEEF)), then that is somewhat like a memory address, but you're better off treating it as a unique number identifying this particular hash.

        Perl has no processes running asynchronously to your program. There is no "garbage collection" process, and memory allocated to Perl data structures is only released when there is nothing more (in Perl space) referencing it. So you have a logic error in your program that has nothing to do with memory management at that level.

        Now, what could help us help you better was if you can show the relevant code that cleans out the hash contents and how you call it. Maybe it is something like this:

        sub foo { my %some_hash = @_; }; foo( %another_hash );

        Here, changes to %some_hash will never show up in %another_hash.

        But maybe you are passing around nested references like this:

        sub foo { my %some_hash = @_; }; my %another_hash = ( a_deeper_hash => { a => 1, }, another_deeper_hash => { b => 2, }, ); foo( %another_hash );

        Now, changes made to $some_hash{ a_deeper_hash } will change $another_hash{ a_deeper_hash }, as the copy of the hash is a shallow copy.

        My advice is this vague because you steadfastly refuse to show even what you see, the concrete debug messages and the code producing them, not to mention the code relevant to this. I understand that it's a tough task to remove more and more parts from a program that fails from time to time until you get to the input that makes it fail, but the only approach I know to accomplish is to reduce the input data until you find a dataset that always triggers the behaviour and then to start removing parts of the code, keeping only those parts that reproduce the error.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://863796]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-12-20 01:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls