Reference counting has a number of problems:
1 It's not tough to get circular structures at the user (i.e. the application programmer) level. This means that either a program leaks, or each application that potentially has to generate circular structures has to kill them.
2 For refcounting to work right, every increment must have a matching decrement. Historically (which is to say, in every release of perl I've seen) this doesn't happen. Sometimes its only a few variables, sometimes it's a lot of variables. Either way, there's leakage. It also means that every extension writer must get refcounting right. For anything other than trivial extensions, they don't.
3 To support refcounting, each and every time a variable is inserted into a container or removed from a container the refcount must be changed. This is a lot of code. Go browse through the perl 5 source (And don't forget about extensions) to see how often this happens. This is essentially makework code--it does almost nothing useful, and is entirely bookkeeping.
4 Reference counting is the slowest form of garbage collection for all but the most extremely degenerate code. Reference counting requires you to spend time touching each variable at least twice. This is all your variables, not just the ones that are alive. Tracing garbage collectors (which include the mark & sweep, compacting, and generational style collectors) generally only touch live data. Yes, it's only a quick increment or decrement, but it's a lot of them. Refcount schemes usually spend about twice as much time doing garbage collection work than tracing collectors.
5 Refcounting makes the internal data structures and code less dense, and that means that your program spends more time waiting on main memory. (Fetching data from RAM can cost up to 200 cycles on some systems) Your code is less dense, since you have refcount twiddling code all through it. Your data is also less dense, since you need to have space for refcounts in it. And even worse, it means more of your data structure has to be touched when its dealt with. This screws your L1 cache. Generally data comes into your CPU's L1 cache in small chunks--8 or 16 bytes. You want to use as few of these chunks as you can since, while they're fastest to access (usually a few cycles at most) there's not much of it. And loading a cache line can cost 10 or 20 cycles if the data you're looking for is in your L2 cache. (And 200 or more if it isn't) The less memory that needs to be touched to do something means the more efficiently the cache is used and the faster your program runs.
Tracing collectors, which is what parrot's using, generally have different characteristics.
All of that is why we don't like refcounts. :)
BTW: Your LISP book's numbers may well be really, really old. Much of GC's bad reputation for pauses come from benchmarks run 20 or more years ago, on VAX 11/780 machines with 2M of RAM and an RA60 as a swap disk. These days things run rather a lot faster. A quick run of one of parrot's benchmarks shows it did 43K GC runs in 22 seconds on my machine, in addition to the sample program's real work. That's not bad... (Though, yes, it does argue that the sample program needs more work to generate less garbage)