Your skill will accomplish
what the force of many cannot
Re^4: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?by chromatic (Archbishop)
|on Aug 28, 2012 at 00:00 UTC||Need Help??|
You're half right. hv.c does demonstrate one of the big problems in optimizing Perl 5, but no amount of static optimization fairy dust magic will help.
The biggest problem with regard to SVs and unoptimizability is that the responsibility for determining how to access what's in the SV is in every op. That's why every op that does a read has to check for read magic and every op that does a write has to check for write magic. That's why ops are fat in Perl 5. (That's why ops are fat in Parrot, which still has a design that would make it a much faster VM for Perl 5.6.)
Moving magic into SVs from ops would help, as would porting the Perl 5 VM to C++ which can optimize for the type of static dispatch this would enable.
LLVM would only really help if you could compile all of Perl 5 and the XS you want for any given program to LLVM IR and let LLVM optimize across the whole program there, but even then you still have to move magic into the SVs themselves (or spend a lot of time and memory tracing types and program flow at runtime) to be able to optimize down to a handful of processor ops.
I suspect no one's going to do that for a 10% performance improvement at the cost of 10x memory use.
With that said, I must disagree with:
Perl's particular brand of preprocessor macro-based, Virtual Machine was innovative and way ahead of its time when it was first written.
Not if you look at a good Forth implementation or a decent Smalltalk implementation, both of which you could find back in 1993.
Perl's current memory allocator has so many layers to it, that it is neigh impossible to switch in something modern, tried and tested, like the Bohiem allocaotor.
I don't see how Boehm would help. The two biggest memory problems I've measured are that everything must be an SV (even a simple value like an integer) and that there's no sense of heap versus stack allocation. Yes, there's the TARG optimization, and that helps a lot, but if you want an order of magnitude speed improvement, you have to avoid allocating memory where you don't need it.
Someone decided that rather than use the hardware optimised (and constantly re-optimised with each new generation) hardware stack fr parameter passing, it was a good idea to emulate the (failed) hardware-based, register-renaming architecture of (the now almost obsolete) RISC processors, in software.
You're overlooking two things. First, you can't do anything interesting with continuations if you're tied to the hardware stack. Second, several research papers have shown that a good implementation of a register machine (I know the Dis VM for Inferno has a great paper on this, and the Lua 5.0 implementation paper has a small discussion) is faster than the equivalent stack machine. I think there's a paper somewhere about a variant of the JVM which saw a measurable speed improvement by going to a register machine too. (Found it in the bibliography of the Lua 5.0 paper: B. Davis, A. Beatty, K. Casey, D. Gregg, and J. Waldron. The case for virtual register machines.)
... but with all that said, Parrot's lousy calling convention system is not a good example of a register machine. A good register machine lets you go faster by avoiding moving memory around. Parrot's calling conventions move way too much memory around to go fast.