A better way to improve usage of cache without going through a lot of careful tuning is to keep actively accessed data together, and avoid touching lots of memory randomly.
My understanding (from my view somewhere in the bleachers) is that Parrot's garbage collection will provide both benefits.
Incidentally correcting a point you made in your original post, the importance of Parrot having lots of registers is not to make efficient use of cache. It is to avoid spending half of the time on stack operations (estimate quoted from my memory of elian's statement about what JVM and .NET do). In a register-poor environment, like x86, you come out even. In a register-rich environment you win big. (Yes, I know that x86 has lots of registers - but most are not visible to the programmer and the CPU doesn't always figure out how to use them well on the fly.)
Before someone pipes up and says that we should focus on x86, Parrot is hoping to survive well into the time when 32-bit computing is replaced by 64-bit for mass consumers. Both Intel and AMD have come out with 64-bit chips with far more registers available to the programmer than x86 has. That strongly suggests that the future of consumer computing will have lots of registers available. (Not a guarantee though, the way that I read the tea leaves is that Intel is hoping that addressing hacks like PAE will allow 32-bit computing to continue to dominate consumer desktops through the end of the decade. AMD wants us to switch earlier. I will be very interested to see which way game developers jump when their games start needing more then 2GB of RAM.)