comment on

Reference counting has a number of problems:

Circular garbage leaks
It's horribly error-prone
It requires a lot of code
It's slow
It blows your cache

To take those in more detail:

1 It's not tough to get circular structures at the user (i.e. the application programmer) level. This means that either a program leaks, or each application that potentially has to generate circular structures has to kill them.

2 For refcounting to work right, every increment must have a matching decrement. Historically (which is to say, in every release of perl I've seen) this doesn't happen. Sometimes its only a few variables, sometimes it's a lot of variables. Either way, there's leakage. It also means that every extension writer must get refcounting right. For anything other than trivial extensions, they don't.

3 To support refcounting, each and every time a variable is inserted into a container or removed from a container the refcount must be changed. This is a lot of code. Go browse through the perl 5 source (And don't forget about extensions) to see how often this happens. This is essentially makework code--it does almost nothing useful, and is entirely bookkeeping.

4 Reference counting is the slowest form of garbage collection for all but the most extremely degenerate code. Reference counting requires you to spend time touching each variable at least twice. This is all your variables, not just the ones that are alive. Tracing garbage collectors (which include the mark & sweep, compacting, and generational style collectors) generally only touch live data. Yes, it's only a quick increment or decrement, but it's a lot of them. Refcount schemes usually spend about twice as much time doing garbage collection work than tracing collectors.

5 Refcounting makes the internal data structures and code less dense, and that means that your program spends more time waiting on main memory. (Fetching data from RAM can cost up to 200 cycles on some systems) Your code is less dense, since you have refcount twiddling code all through it. Your data is also less dense, since you need to have space for refcounts in it. And even worse, it means more of your data structure has to be touched when its dealt with. This screws your L1 cache. Generally data comes into your CPU's L1 cache in small chunks--8 or 16 bytes. You want to use as few of these chunks as you can since, while they're fastest to access (usually a few cycles at most) there's not much of it. And loading a cache line can cost 10 or 20 cycles if the data you're looking for is in your L2 cache. (And 200 or more if it isn't) The less memory that needs to be touched to do something means the more efficiently the cache is used and the faster your program runs.

Tracing collectors, which is what parrot's using, generally have different characteristics.

No mainline code does anything with refcounts or other 'liveness' bookkeeping data. That means the mainline code is denser, with more of it actually dedicated to executing your program.
Internal data structures are either smaller (if the GC bookkeeping fields are kept out of band) or at least use less cache (since while still in the structure they aren't touched and thus fetched into L1 cache).
There's less code dedicated to GC itself and, while a tracing garbage collector isn't exactly trivial, it's not a big deal either. More to the point, the code required to do the GC is collected together in one single spot, rather than sprinkled througout the entire codebase. (It took me only a few hours to write the first cut of Parrot's GC)
Tracing collectors take less time to execute. Since they make more efficient use of the cache (there's only a small chunk of code doing GC, so it gets pinned in L1 cache while it runs, and it generally restricts itself to a smallish chunk of memory when it runs so it gains some cache locality that way) you waste less time on the memory bus, and since tracing GCs only look at the live data, their runtime's proportional to the amount of live data you have, generally a much smaller number than the number of data objects you've allocated since the last GC run.

So, tracing collectors are generally faster, less error-prone, have less code, and put less burden on the people writing the core and extensions in C. The one (and only) advantage that refcounting has is deterministic destruction. (Assuming no bugs, of course) Other than that it's a massive pain to deal with for the internals folks.

All of that is why we don't like refcounts. :)

BTW: Your LISP book's numbers may well be really, really old. Much of GC's bad reputation for pauses come from benchmarks run 20 or more years ago, on VAX 11/780 machines with 2M of RAM and an RA60 as a swap disk. These days things run rather a lot faster. A quick run of one of parrot's benchmarks shows it did 43K GC runs in 22 seconds on my machine, in addition to the sample program's real work. That's not bad... (Though, yes, it does argue that the sample program needs more work to generate less garbage)

In reply to Re: Re: On timely destruction? by Elian
in thread On timely destruction? by Elian

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Your skill will accomplish what the force of many cannot
	PerlMonks