Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

On timely destruction?

by Elian (Parson)
on Aug 28, 2002 at 00:13 UTC ( [id://193313]=perlquestion: print w/replies, xml ) Need Help??

Elian has asked for the wisdom of the Perl Monks concerning the following question:

Here's a question for everyone. We're finalizing Parrot's object/variable destruction guarantees, and I'm trying to find out the common cases where dead-on timely destruction of objects is necessary, such that the lack of timely destruction would alter the semantics of the program badly enough to break it. With that in mind, I'm looking to tap the experience of the Monastery for such things. This is the last real chance to chime in--anything after this will have to be hacked in and likely will be slowish.

Keep in mind the following things:

  • Don't worry about guts-level things (memory usage and such). That's my problem, and we've got it covered already
  • You can install block-exit handlers, including in your caller's block, so you don't have to play DESTROY games to get lexical exit actions.
  • Allocation failure (for example running out of filehandles) will trigger a GC sweep and retry of the failing operation, so your program won't run out of things for lack of timely cleanup.
I'm currently aware of exactly one case where this can be an issue. To wit:
{ my $foo; open $foo, "<bar.txt"; print "Baz\n"; } { my $foo; open $foo, "<bar.txt"; print "xyzzy\n"; }
where not doing timely destruction of the filehandle will potentially end up doing odd things to the program. Locks and such on files fall in the same general category. (This can be dealt with by having open push a GC sweep on the list of block exit handlers, but I can see niggly issues there)

Anyone have any others? Note that I don't generally consider a delay in cleaning up after a lexically scoped thingie that's escaped its scope a big deal. (For example, when a filehandle gets put in a global array that's later completely cleaned out) I can be convinced otherwise with sufficient argument, of course. :)

Update by myocom: Added <readmore> tag

Replies are listed 'Best First'.
Re: On timely destruction?
by dws (Chancellor) on Aug 28, 2002 at 00:30 UTC
    I'm trying to find out the common cases where dead-on timely destruction of objects is necessary, such that the lack of timely destruction would alter the semantics of the program badly enough to break it.

    I ran into cases where delayed destruction caused problem while working on database drivers for Smalltalk years ago. There may be a parallel to Perl. These drivers were the equivalent of DBD. Instances of drivers would hold external references to stuff in the native database libraries. If one dropped a reference to a database statement, ideally what would happen is that a destructor method would make the necessary call to the native database library to free the statement. (We were using "weak reference" under the covers to make this work.) But ParcPlace Smalltalk would only destroy objects after a garbage collection sweep, which could be deferred in time. We ended up having to manually trigger GCs at points to force "finalization" of database driver objects.

    The parallel that might hold for Perl is where an object "holds" an reference to an object in an external system (e.g., a database statement handle), and where having the external object "open" is semantically different from having it closed/freed. If script drops all references to the Perl-side object representing a statement handle, but the invocation of DESTROY is delayed, the external system (the native database library) could be left in an undesirable state.

Re: On timely destruction?
by TheDamian (Vicar) on Aug 28, 2002 at 07:47 UTC

    A typical problem is how to wrap the accessor methods of a class, so that some check is performed once the returned value is dispensed with. For example, suppose you have to maintain some code that provides direct access to the "name" field stored within a CachedFile object. You might have:

    package CachedFile; sub new { my ($class, $name) = @_; bless { name => $name , contents => "" }, $class; } sub name { my ($self) = @_; return \$self->{name}; }

    But because the class gives out direct access to the name, you don't have control of it. If you need to ensure that the name doesn't exceed 12 characters in length, you have no way to do so; any changes to the name field will occur after CachedFile::name has finished:

    ${$cachedfile->name} = "a_long_file_name";

    One solution is not to return a reference to the name field at all. Instead, you return a reference to an imposter, which then forwards all requests to the real name field.

    When the full expression in which this imposter was created is finished, the last reference to the imposter will disappear and its destructor will be called and can then check for foul play.

    The class that implements the imposter – or "proxy" – looks like this:

    package Proxy; sub for { tie my($proxy), $_[0], @_[1..3]; return \$proxy; } sub TIESCALAR { my ($class, $original, $postcheck, $message) = @_; bless { original => $original, postcheck => $postcheck, message => $message, }, $class; } sub FETCH { my ($self) = @_; return ${$self->{original}}; } sub STORE { my ($self, $newval) = @_; ${$self->{original}} = $newval; } sub DESTROY { my ($self) = @_; croak $self->{message} unless $self->{postcheck}->($self->{original}); }

    The CachedFile class would then set up its name accessor like so:

    package CachedFile; sub new { my ($class, $name) = @_; bless { name => $name , contents => "" }, $class; } sub name { my ($self) = @_; return Proxy->for(\$self->{name}, sub{ length(${$_[0]}) <= 12 }, "File name too long!" ); }

    Now any attempt to assign an extravagant name causes an exception to be thrown:

    my $file = CachedFile->new("orig_name"); ${$file->name} = "shrt_fl_nm"; # okay ${$file->name} = "a_long_file_name"; # KABOOM!

    There are many such idioms that rely on proxy objects being destroyed at the end of the statement in which they're created (rather than at the end of the surrounding scope). So setting an end-of-scope action doesn't help these cases, since we want the effects to have been applied much earlier than that: before the next statement, in fact.

      Well, that's one way to go about it. For something like that, where the data element has some sort of restrictions on it, you're much better off using typed data for the element you need to have escape, and let the class the data is blessed into pitch a fit in the event an assignment to it violates the constraits on that data. Since typed data can overload assignment, it's an easy way to have action-at-a-distance validation of data going into variables.

      That eliminates the need for proxy objects and suchlike hackery, and provides a cleaner interface. I expect someone (like, say... you? :) will come up with a constraints module such that you can say:

      my $foo has constraint({length $^a < 12});

      To allow tagging on contstraint conditions. (Assuming, in this case, that constraint takes a list of closures which all must return true to allow the assignment) Modulo proper perl 6 syntax, of course.

Re: On timely destruction?
by jepri (Parson) on Aug 28, 2002 at 03:05 UTC
    Could someone please explain why counting references is so bad? I have heard many people bag out perls GC but I don't really see the problem. I appreciate the 'circular references' issue, but it does seem to be a fairly minor issue dealt with by avoiding it or manually breaking the circle. How often do people use circular refs?

    I take it there is no clever way to spot a circular reference?

    This is all in contrast to my lisp manual in which I just read "Don't worry about occasional pauses, it's just the GC firing". Eeek!

    Does delayed destruction give some advantage like reusing the memory more quickly?

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

      Reference counting has a number of problems:
      1. Circular garbage leaks
      2. It's horribly error-prone
      3. It requires a lot of code
      4. It's slow
      5. It blows your cache
      To take those in more detail:

      1 It's not tough to get circular structures at the user (i.e. the application programmer) level. This means that either a program leaks, or each application that potentially has to generate circular structures has to kill them.

      2 For refcounting to work right, every increment must have a matching decrement. Historically (which is to say, in every release of perl I've seen) this doesn't happen. Sometimes its only a few variables, sometimes it's a lot of variables. Either way, there's leakage. It also means that every extension writer must get refcounting right. For anything other than trivial extensions, they don't.

      3 To support refcounting, each and every time a variable is inserted into a container or removed from a container the refcount must be changed. This is a lot of code. Go browse through the perl 5 source (And don't forget about extensions) to see how often this happens. This is essentially makework code--it does almost nothing useful, and is entirely bookkeeping.

      4 Reference counting is the slowest form of garbage collection for all but the most extremely degenerate code. Reference counting requires you to spend time touching each variable at least twice. This is all your variables, not just the ones that are alive. Tracing garbage collectors (which include the mark & sweep, compacting, and generational style collectors) generally only touch live data. Yes, it's only a quick increment or decrement, but it's a lot of them. Refcount schemes usually spend about twice as much time doing garbage collection work than tracing collectors.

      5 Refcounting makes the internal data structures and code less dense, and that means that your program spends more time waiting on main memory. (Fetching data from RAM can cost up to 200 cycles on some systems) Your code is less dense, since you have refcount twiddling code all through it. Your data is also less dense, since you need to have space for refcounts in it. And even worse, it means more of your data structure has to be touched when its dealt with. This screws your L1 cache. Generally data comes into your CPU's L1 cache in small chunks--8 or 16 bytes. You want to use as few of these chunks as you can since, while they're fastest to access (usually a few cycles at most) there's not much of it. And loading a cache line can cost 10 or 20 cycles if the data you're looking for is in your L2 cache. (And 200 or more if it isn't) The less memory that needs to be touched to do something means the more efficiently the cache is used and the faster your program runs.

      Tracing collectors, which is what parrot's using, generally have different characteristics.

      • No mainline code does anything with refcounts or other 'liveness' bookkeeping data. That means the mainline code is denser, with more of it actually dedicated to executing your program.
      • Internal data structures are either smaller (if the GC bookkeeping fields are kept out of band) or at least use less cache (since while still in the structure they aren't touched and thus fetched into L1 cache).
      • There's less code dedicated to GC itself and, while a tracing garbage collector isn't exactly trivial, it's not a big deal either. More to the point, the code required to do the GC is collected together in one single spot, rather than sprinkled througout the entire codebase. (It took me only a few hours to write the first cut of Parrot's GC)
      • Tracing collectors take less time to execute. Since they make more efficient use of the cache (there's only a small chunk of code doing GC, so it gets pinned in L1 cache while it runs, and it generally restricts itself to a smallish chunk of memory when it runs so it gains some cache locality that way) you waste less time on the memory bus, and since tracing GCs only look at the live data, their runtime's proportional to the amount of live data you have, generally a much smaller number than the number of data objects you've allocated since the last GC run.
      So, tracing collectors are generally faster, less error-prone, have less code, and put less burden on the people writing the core and extensions in C. The one (and only) advantage that refcounting has is deterministic destruction. (Assuming no bugs, of course) Other than that it's a massive pain to deal with for the internals folks.

      All of that is why we don't like refcounts. :)

      BTW: Your LISP book's numbers may well be really, really old. Much of GC's bad reputation for pauses come from benchmarks run 20 or more years ago, on VAX 11/780 machines with 2M of RAM and an RA60 as a swap disk. These days things run rather a lot faster. A quick run of one of parrot's benchmarks shows it did 43K GC runs in 22 seconds on my machine, in addition to the sample program's real work. That's not bad... (Though, yes, it does argue that the sample program needs more work to generate less garbage)

        Thanks for the detailed explanation. Now I can see what you are trying for.

        To put the quote in context, it came from the manual for the Lisp machine on my Palm Pilot. Which is probably only slightly faster than the VAX you quote in your example.

        ____________________
        Jeremy
        I didn't believe in evil until I dated it.

        Assorted comments:

        I've read for years that the field of gc is quite mature, and some killer algorithms exist (way beyond simple mark&sweep).

        How is access kept local? Chasing all the pointers would hit all that main memory, even if the marks are collected somewhere central.

        I like the idea of incremental garbage collection, so there will be no pauses.

        What about the case of local variables that never have their ref taken, and are not inside a closure that itself gets "taken"? If detected at compile-time, those can be stacked most efficiently and not generate garbage at all.

        On a machine with VM, must figure out when is "out" of memory! I can use gobs more than the machine has, transparantly to the program. Perhaps moniter page fault frequency? Best to check for garbage before a page gets swapped out at all.

        —John

        What a wonderfully clear response. Might I suggest a quick cut and paste into the Parrot FAQ?

Re: On timely destruction?
by Abigail-II (Bishop) on Aug 28, 2002 at 13:14 UTC
    Inside Out Objects, in combination with a class method that checks how many instances of an object there are:
    package Test; my %attr; sub new {bless [] => shift} sub init {$attr {+shift} = undef} sub attr { my $self = shift; $attr {$self} = shift if @_; $attr {$self} } sub DESTROY { delete $attr {+shift} } sub HowMany {scalar keys %attr}
    If you don't do timely destruction, HowMany will return the wrong value. This will be a disaster for programs that limit the number of instances that are allowed to be around.

    However, it's not just the semantics you should worry about. Performance might depend on its as well. More than once I've written code where in the DESTROY method a file is closed (and hence, a lock on it is released). If the DESTROY no longer is timely, the semantics won't change - the program would still act correctly.

    But the performance would no longer be acceptable, as other programs don't get their locks fast enough.

    You can install block-exit handlers, including in your caller's block, so you don't have to play DESTROY games to get lexical exit actions.
    That's nice of newly written perl6 programs. But what about perl5 programs running in a perl6 environment? Or just a perl5 module used from a perl6 program? Will they have timely DESTROY?

    Allocation failure (for example running out of filehandles) will trigger a GC sweep and retry of the failing operation, so your program won't run out of things for lack of timely cleanup.
    Could you elaborate on that? If program1 (not necessarely written in Perl) has an allocation failure because program2 (written in Perl) hasn't done a GC sweep yet, how's that going to trigger a GC sweep in program2?

    I'm more concerned about the overal impact on the system a perl6 program is going to run on then the impact on the program itself.

    Java's unpredictable GC is already giving lots of people a headache when dealing with long running Java programs. It would be a pity if Perl goes that way too.

    Abigail

      If you don't do timely destruction, HowMany will return the wrong value.
      Well, no, not really. We'll return the right value, which is the number of not-dead objects. That the number is surprising to the programmer is arguably rather sub-optimal but not flat-out wrong. (Yes, I know I'm arguing twiddly bits here)

      The class will have the option of forcing a sweep for dead objects if it so chooses, so there would be a fallback. Yes, this is definitely sub-optimal, and is just a variant on the weak-ref problem. (i.e. a hack of sorts to get around implementation issues)

      However, it's not just the semantics you should worry about. Performance might depend on its as well. More than once I've written code where in the DESTROY method a file is closed (and hence, a lock on it is released). If the DESTROY no longer is timely, the semantics won't change - the program would still act correctly.

      But the performance would no longer be acceptable, as other programs don't get their locks fast enough.

      I'm not sure that's much of a problem. The delay in destruction should be on the order of milliseconds under most circumstances. If that's an unacceptable delay, then odds are something more explicit than relying on automatic destruction is in order.
      You can install block-exit handlers, including in your caller's block, so you don't have to play DESTROY games to get lexical exit actions.
      That's nice of newly written perl6 programs. But what about perl5 programs running in a perl6 environment? Or just a perl5 module used from a perl6 program? Will they have timely DESTROY?
      Perl 5 programs won't have as timely a DESTROY as they do running on the perl 5 engine. Optionally forcing DESTROY checks at block boundaries will be doable, and I suppose we can optionally force it at statement boundaries, though that will be slow.
      Allocation failure (for example running out of filehandles) will trigger a GC sweep and retry of the failing operation, so your program won't run out of things for lack of timely cleanup.
      Could you elaborate on that? If program1 (not necessarely written in Perl) has an allocation failure because program2 (written in Perl) hasn't done a GC sweep yet, how's that going to trigger a GC sweep in program2?
      This only applies within a running parrot program of course--we're not in a position to affect the behaviour of other programs. It's only when there's a potentially recoverable resource allocation within a program that we can do this. Locks, for example, can be tried in non-blocking mode and, if that fails, a GC sweep can be done and the lock retried in blocking mode to wait on other programs that might have it allocated. File open failures due to a lack of filehandles can similarly be tried, a GC run tried, then retried and only then throwing an exception if there are still no filehandles.
      Java's unpredictable GC is already giving lots of people a headache when dealing with long running Java programs. It would be a pity if Perl goes that way too.
      Could you elaborate on this? There are always tradeoffs when making choices, and cleaner/faster internals and no worries about circular garbage is the tradeoff for guaranteed destruction timing. If there are more serious ramifications, it'd be good to know so we can take steps to ameliorate them.
        If you don't do timely destruction, HowMany will return the wrong value.
        Well, no, not really. We'll return the right value, which is the number of not-dead objects. That the number is surprising to the programmer is arguably rather sub-optimal but not flat-out wrong. (Yes, I know I'm arguing twiddly bits here)
        I wouldn't call that twiddling of bits. You're redefining English. You ask for cases where the proposed implementation breaks a program. I give you an example of something that isn't at all contrived, and you dismiss it as "it's not broken, it just does something surprising". I'm glad the attitude of p5p isn't this way, any bug report could be dismissed this way.

        Why did you ask the question anyway?

        The class will have the option of forcing a sweep for dead objects if it so chooses, so there would be a fallback. Yes, this is definitely sub-optimal, and is just a variant on the weak-ref problem. (i.e. a hack of sorts to get around implementation issues)
        I don't like any implementation issues surfacing in the language. But it's certainly worse than the circular references problem you have with refcounting. Then when and whether or not your DESTROY method is called is predictable. It's properly defined when it's called, as soon as the last reference to it disappears. With the proposed implementation, when DESTROY is being called becomes unpredictable.
        #!/usr/bin/perl use strict; use warnings 'all'; package Foo; sub new { my ($class, $ref) = @_; bless [$ref] => $class; } sub DESTROY { my $self = shift; print $self -> [0] [-1], "\n"; } package main; my $ref = []; foreach my $c (qw /one two three four/) { push @$ref => $c; my $obj = Foo -> new ($ref); } print "Exit\n"; __END__ one two three four Exit
        But what will this do if DESTROY is being called at random times? What can, and what can't you do? Will DESTROY be like unsafe signal handlers?
        However, it's not just the semantics you should worry about. Performance might depend on its as well. More than once I've written code where in the DESTROY method a file is closed (and hence, a lock on it is released). If the DESTROY no longer is timely, the semantics won't change - the program would still act correctly.

        But the performance would no longer be acceptable, as other programs don't get their locks fast enough.

        I'm not sure that's much of a problem. The delay in destruction should be on the order of milliseconds under most circumstances. If that's an unacceptable delay, then odds are something more explicit than relying on automatic destruction is in order.
        Do you have anything to back up the "I think", "most circumstances" and "odds are"? If 5 out of 100 planes have a bomb on board, then odds are that you arrive safely, under most circumstances there's no bomb on board. But you won't see a huge rush in plane tickets.
        Perl 5 programs won't have as timely a DESTROY as they do running on the perl 5 engine. Optionally forcing DESTROY checks at block boundaries will be doable, and I suppose we can optionally force it at statement boundaries, though that will be slow.
        Ah "slow", the magic word that can stop anything. Refcounting is also slow, but that didn't stop Perl from being useful for 14 years. *Anything* is slow, for some value of slow. But the common behaviour apparently wasn't slow enough for people to not use OOP. In fact, it was remarkably succesful. Now, if perl6 will be so much slower than perl5 it has to let go of useful features just to keep up, I think we can do without perl6.
        Allocation failure (for example running out of filehandles) will trigger a GC sweep and retry of the failing operation, so your program won't run out of things for lack of timely cleanup.
        Could you elaborate on that? If program1 (not necessarely written in Perl) has an allocation failure because program2 (written in Perl) hasn't done a GC sweep yet, how's that going to trigger a GC sweep in program2?
        This only applies within a running parrot program of course--we're not in a position to affect the behaviour of other programs. It's only when there's a potentially recoverable resource allocation within a program that we can do this. Locks, for example, can be tried in non-blocking mode and, if that fails, a GC sweep can be done and the lock retried in blocking mode to wait on other programs that might have it allocated. File open failures due to a lack of filehandles can similarly be tried, a GC run tried, then retried and only then throwing an exception if there are still no filehandles.
        I understand all of that. But I've been using UNIX for almost 20 years, and I know that running just one task on a machine is an exceptional case. Programs have to compete for resources. Nice programs let resources go as soon as possible, and don't use more than necessary. Nice languages allow for nice programs to be written. C is a nice language. Perl is nice, although it gobbles up a lot of memory. Java isn't a nice language. And it looks like Perl6 is going to be further away from C, and closer to Java.
        Java's unpredictable GC is already giving lots of people a headache when dealing with long running Java programs. It would be a pity if Perl goes that way too.
        Could you elaborate on this? There are always tradeoffs when making choices, and cleaner/faster internals and no worries about circular garbage is the tradeoff for guaranteed destruction timing. If there are more serious ramifications, it'd be good to know so we can take steps to ameliorate them.
        Well, if you find my ramifications about unpredictability not enough, there isn't much left to discuss, is there?

        Abigail, who only seems more reasons not to use perl6.

Re: On timely destruction?
by kschwab (Vicar) on Aug 28, 2002 at 01:47 UTC
    I've seen Tk based scripts rely on this type of thing to trap window destruction. There's obviously better ways to do that, so I wouldn't call it a show stopper.

    It might be worth polling the Tk community though, as they've had a pretty good history of complaining about GC :). ( regarding Tk's destroy vs DESTROY, and the underlying XS code hanging on to gobs of memory)

•Re: On timely destruction?
by merlyn (Sage) on Aug 28, 2002 at 00:27 UTC
    { my $foo; open $foo, "<bar.txt"; print "Baz\n"; } { my $foo; open $foo, "<bar.txt"; print "xyzzy\n"; }
    That's not an issue, because any new use of open should do a clean close, as if the variable has also gone out of scope.

    -- Randal L. Schwartz, Perl hacker

      Well... not in the case I posited. Since the filehandle variables were lexically scoped, the second open is using a different lexical than the first open, so no force-close there.
Re: On timely destruction?
by theorbtwo (Prior) on Aug 28, 2002 at 01:40 UTC

    I often have filehandles that escape their lexical context. To whit, I have a sub to open a file, and return the filehandle. It's called replacable parts -- for the prototype I have it open a static filename, later I have it be more configurable, with a config file, or whatever. (OK, I don't, but if I wrote cleanly, I would.)


    Confession: It does an Immortal Body good.

      That's fine--that sort of thing is still perfectly legal and will be handled fine. The question is: Do you depend on the returned filehandle being immediately closed when the variable holding it goes out of scope? If "sooner or later, as long as it's not too much later" is OK (and the later part is on the order of milliseconds, usually), then it's not a problem and not something I need to worry about here.
        The main problem that I see is that people may do a lot of suffering from buffering. For instance in a function they write to a file. But they don't have an explicit close, so it doesn't get flushed until gc. Elsewhere you interact with the same file (perhaps you called the same function again) and get all confused when data that you know has been written, hasn't been. (And since you are sure that it has been written - a fact that you can verify by looking at it in an editor - you will look everywhere else for the problem instead...)

        Well, in the case of most of the scripts I write, no, I don't rely on timely destruction; I rarely deal with high-contention files. However, even with fairly high-contention files (and suchlike things), an unlock being a few ms late shouldn't be to terrible.


        Confession: It does an Immortal Body good.

Re: On timely destruction?
by John M. Dlugosz (Monsignor) on Aug 28, 2002 at 16:33 UTC
    Thanks for asking. I'm a zelot on timely destruction, and have been meaning to lobby for not dropping it in Perl 6!

    If there is a performance issue, at least make it optional. Objects that need it can have DESTROY called when the last reference is dropped, even if others wait for a central garbage collection scheme or have the code moved outward a few more blocks.

    If allocation failure of X for any X triggers a sweep, that's great and addresses the "not everything is memory" issue.

    But, here is an example: file is still open so nobody else can open it until it's cleaned up, as opposed to running out of file handles (not going to happen in Windows).

    How about a window on the screen? I drop the object and the window doesn't go away until the gc gets around to it! Sure, I can code the close() call myself at the end of the block (and wish I still had real destructors), but that does not work in the face of exceptions. Don't make me put this in a finally block every time; the class should know that and take care of itself.

    How about any code that has a "before" and "after" stage, with my code in between. A semaphore is a good example. Building the "after" semantics into the object with language support is a useful tool. The idea of "resource acquisition is initialization" is powerful, and heavily used by C++ programmers. With Perl6 poised to take over the world, don't annoy your potential users like Java did by removing features that they rely on in their other languages.

    Ideas: Some classes can be declared with a "needs timely destruction" property. If at least one of these is in a scope, then a gc sweep will be triggered at the end (or finally clause) of that block.

    Or, the proper call can be added to the block's finally clause implicitly (Larry showed an explicit way to note this as a property on the declaration of the variable. take that one step further).

    Putting in a close() myself or implicitly is going to do it most of the time, but will break down for closures. The compiler can count references; let it do that work for me. But again, the compiler can know when this is never going to happen and emit the simple code. How much of that is part of the low-level Parrot assembly code vs. smarts in the compiler? I think if the compiler could emit code that says "exit scope; if referece count of the variable reached zero, then call function" then it can take it from there.

    —John

      More precisely, you're arguing for timely finalization, not destruction. The time of destruction usually isn't relevant.

      Making it optional may not help the implementation difficulties much. Any object requiring prompt finalization may be pointed to by something buried deep within a graph of objects that don't care, so you still have to check for the final reference being lost even when you're only manipulating objects that don't care.

      Your locked open file example is a good one, I think, although I wonder how many times it could be handled with a scope exit action (how often will you pass it to some routine that could keep a global handle on it? If you're doing that, then you're probably not too concerned with the locking issue...)

      The window example can certainly be handled with a scope exit mechanism -- you wouldn't be coding the cleanup down at the bottom of the block, you'd be inserting it into a CLEANUP{} block or something that the compiler would be required to call no matter how the block is exited.

      Your gc sweep at the end of a block idea works, except for the case when the variable escapes the block (eg it's referenced by a global cache.) Then you'd need to remember that there is at least one rogue object that requires timely finalization floating around, and you'd have to trigger a sweep at every scope exit. Or worse, if you really want it immediate -- simple variable assignment drops a reference to the previous contents, so perl5 can trigger DESTROY even then. A sweep after every assignment might be a little slow...

      Thanks for asking. I'm a zelot on timely destruction, and have been meaning to lobby for not dropping it in Perl 6!
      Cool, then. Give me examples of things that'll break without timely finalization!
      If there is a performance issue, at least make it optional.
      Ah, therein lies the rub. You can't make something like this optional. It's either there, with full support for it, or it's not there. There is no optional here. (If there was, I'd not be asking this question)

      Unfortunately none of the things you've presented quite do it. Filehandles will be cleaned up reasonably quickly. Windows definitely ought not go missing without some active action on the part of a program. And "after" semantics on objects when they get destroyed, well, you get that. It's just potentially indeterminate when it happens. (Which, I realize, is the issue)

      I think if the compiler could emit code that says "exit scope; if referece count of the variable reached zero, then call function" then it can take it from there.
      There are no reference counts in parrot, and thus none in perl 6. If there were this wouldn't be an issue.
Re: On timely destruction?
by Juerd (Abbot) on Aug 28, 2002 at 07:42 UTC

    Apo 4 describes my $foo is post { close } = open 'bar.txt';, so that's no problem. I wonder what would happen on push our @foos, $foo, though. Are properties connected to variables or to their values?

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      They can be attached to either. That purticular prop is on the variable, since it's attached to the my. If you wanted it to be on the value, you should have written my $foo = (open 'bar.txt') is post {close}. If I understand and remember Apoc4 correctly, which I might not.


      Confession: It does an Immortal Body good.

      Now if I didn't have to put is post {close} in every open but could have that implicit, that would be better. Except when I use closures...  is post { close if .dead } maybe, where an object can test if it has no (other) references, and registered that it needs to be counted.

Re: On timely destruction?
by PhiRatE (Monk) on Aug 28, 2002 at 23:51 UTC
    I think the important thing is to consider just how costly the other options are. I'm no expert when it comes to GC, but there are a myriad of options available and I think that timely destruction has one particular benefit: consistancy.

    When you create an object, its created immediately, when you assign something to one, or call a method on one, all of this happens immediately. To take one particular operation and declare "This could happen at any old time" instantly puts the programmer into headache mode, they can never be sure anymore whether a given hard-to-find bug isn't being caused by an object being destroyed too late or equivalent, you end up having people put unnecessary gc forces throughout their code in the process of debugging, and forgetting to remove them later, or even worse having modules end up on cpan with explicit collections to get around obscure issues.

    I see little choice but to make timely GC an option, but I am confident that there is enough collective brainpower available to make it efficient as an option under reasonable circumstances.

    For example, we could say "Yes, you may specify this class requires timely destruction, but if you do so, its *construction* will take longer than normal", and we could then take that construction time to run down the graph tagging all the parents and containers of that object to indicate that there is a child that requires timely destruction.

    Or we may hold a seperate list of objects that require timely destruction, and every time an object goes out of scope, if we have anything on that list we run up the graph from each looking to see if we can reach the object that went out of scope, if so we trigger a gc immediately.

    Not all of these are practical within the architecture available, my knowledge of the parrot internals is pretty much zero, but there are a number of options that, in the case in which no destructors are specified as timely, will run pretty much equivalent to a system that has no optional, but in which when one is specified, we start making performance compromises in return for time guarrantees.

    Of course, in the end, the one who does the coding, gets to make the choice. If I don't like it, I'll do it different later on :)

      I think the important thing is to consider just how costly the other options are. I'm no expert when it comes to GC, but there are a myriad of options available and I think that timely destruction has one particular benefit: consistancy.
      The timely deterministic options (which, in a language with references, is singular--refcounting) are expensive, both in processor time and in programming time. Also rather error-prone, unfortunately.

      They also don't guarantee consistency, though they do guarantee determinism. Not that you're necessarily going to guess the time right, but you've a reasonably good chance.

      To take one particular operation and declare "This could happen at any old time" instantly puts the programmer into headache mode"
      That's always true, though. Perl is sufficiently introspective, and is getting more introspective, to make timing of destruction potentially indeterminate. And most objects don't have DESTROY methods. Just wait until we start throw
      I see little choice but to make timely GC an option, but I am confident that there is enough collective brainpower available to make it efficient as an option under reasonable circumstances.
      Thanks for the vote of confidence, bu I should point out that I am the brainpower in this case, along with a stack of books and papers by people rather more clever than I am. If it was easy or inexpensive to do this, I wouldn't be asking the question.
        Thanks for the vote of confidence, bu I should point out that I am the brainpower in this case, along with a stack of books and papers by people rather more clever than I am. If it was easy or inexpensive to do this, I wouldn't be asking the question.

        Ah, then your solution is indeed somewhat different. I would suggest instead a garbage collection API, such that others who may have some brilliant idea we had not thought of may later come along and implement "deterministic but slow" gc, and someone else may come along and implement "never executes a destructor but is blindingly fast" gc, and yet another may simply look at the standard does-its-best gc and think "I can do better".

        If you make it practical for others to implement their own GC if necessary, you can give away the question of whether determinism is needed currently and put it back on the stack for the people who need it (itch stratch) to worry about it.

        I think your determination that refcounting is the only solution for a language with references is premature. While I agree that, in the general case, it is unlikely there are any alternatives, in the specific case of an embedded system with a number of precisely known quantities such as memory, code to execute and others, refcounting is only one of many potential solutions to the problem, including the potential option of saying "I don't need to gc at all", and another of saying "well shit, we'll just add some gc-supporting hardware to our board since we're suffering so much from it.."

        So, in summary, it is my determination (still :) that the option needs to be available for deterministic GC. I do not, however, think that you need do it, only that the option is readily available. I'm not convinced simply being open-source is enough in this case, the capability to select at build-time (at least, but probably good enough) the GC desired easily is one acceptable way of providing others with both the option and the motivation to implement the GC that fits their needs best.

        You never know, some smart-ass research group might decide that, with parrot supporting so many languages, and with such an easy plug-in for the GC, they could spend a bunch of research money coming up with something with stupifyingly tricky statistical optimisations we daren't consider, which take parrots GC beyond state of the art. Such things are inclined to happen when the architecture supports it.

        I offer a way-out suggestion just for the hell of it:

        It is clear that the non-timeliness of destruction only becomes an issue when an item of some kind of scarcity is held by the relevant object.

        In your example above, the item in question is a lock on a file, in other cases it is a database handle, in other cases it will be some other resource.

        We can solve *part* of the problem (short of refcounting) by registering such contentious resources internally. Thus, rather than closing the filehandle at the right time above, we would instead have the second open say "I wish to register my interest in a file lock on this file" at which point the registry will say "well shit, someone else already has that, lemme check if I should run a gc".

        Its not a particularly pretty concept in that it requires much determining of, and registering likely contentious resources, however due to the nature of the parrot design you may find that it fits quite well as a middle-ground solution, preventing close timing issues within the same instance (although obviously distinct processes will not benefit).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://193313]
Approved by myocom
Front-paged by Zaxo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-03-19 09:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found