I am reminded of the original development of RISC machines.
Those guys found that the only way to find the "best"
approach was to take a sufficiently large example set (booting
UNIX and running some programs) and run it in a well
measured simulation.
When someone proposed a change to the architecture they
changed the simulation, measured the effect on performance
and kept it if it improved the situation.
My guess would be (based admittedly on a total lack of
first hand knowledge) that adding metrics to Parrot (or
Perl 5?) that
emulated the effect of fetching information from disk/ RAM/
cache would lead to a reasonable simulation of Perl's
performace and hence could definitively answer these type
of questions.
This would allow anyone with enough interest and time
on their hands to get a real answer rather than just
sticking a wet finger in the air.