good chemistry is complicated,
and a little bit messy -LW
I'm mostly involved with Perl 6 development, and in many ways unfamiliar with the existing Perl 5 compiler. So I'd like to give a very high level answer. You might already know most or all of it, in which case I'd like to apologize for wasting your time. But it's a topic that comes up quite often, so I think a broad answer can't hurt.
Why is Perl 5 slower than C?
In one word, flexibility. Perl gives you much more flexiblity, for example in what you can place into a variable, what operations you can perform (for example string concatentation on a number, not just on strings), and you pay for that flexiblity with lots of run time checks and indirection.
Or to rephrase, Perl lets you omit some gory details that you have to handle yourself in C code (type conversions, memory management, validity checks on data), and you pay for that with runtime and memory overhead.
On the #perl6 IRC channel, people regularly ask if there will be a compiler that translates Perl 6 code to C, seeming to think that such a compiler magically leads to C-like speed. That's false, of course, because the generated C code would still need to take care to allow all that flexiblity that you don't have in "plain" C.
In summary, flexiblity gives you some essential overhead that is not easy to get rid of (distinguished from superficial overhead that stem from less-than-perfect implementation).
How to speed it up?
First you can work on removing superficial overhead. But that won't give you a factor of 10 over the existing implementation, and I hear that the Perl 5 code base is already quite well micro-optimized.
Maybe hackers like Nicholas Clark, Dave Mitchell and Sprout can point you to some known overhead that is worth removing, and if it's in a hot path or often-used data structure, it might very well pay off. But not a factor 10, more like 5%. If you are lucky.
If you pay for flexibility, an obvious way to pay less is to restrict that flexiblity where you don't need it.
An obvious example are type constraints. If you know that a variable will always be a 32 bit integer, you don't need to store a full SV for it, a 32 bit integer will do. But beware, that means you need to be able to store more data (for the type annotations) in the optree/bytecode, which might mean that all programs pay a penalty for that, even if they don't use that feature.
Other ideas for what flexibility you could give up, in the context of Perl 5, all of which might be useful for optimized code paths:
I don't know the perl 5 guts well enough to find out which of those can be exploited with reasonable effort.
Let the Compiler Determine Restricted Flexibility
Some things (like that a variable will only be an integer, ever) can be determined by the compiler, but it probably can only do so with external help (like under the assumption that no eval will change the variable)
That would be the ideal solution, because it would speed up code without requiring any (or only very little) changes to the code that is sped up.
Optimize for the Common Case
A technique often used in JIT compilers (but available to "normal" ahead-of-time compilers too) is to optimize for the common case.
For example a compiler could determine that a variable starts out as an integer, and simply guess that it will always hold and integer, and generated an optimized code path for that case. Of course the guess will go wrong some time, so it also needs to emit code for the unoptimized general case, and some kind of fallback mechanism that transfers control flow to the general case if the guess turns out to be false.
This seems to be a very promising approach, but also one that requires signficant infrastructure.
Planning for the Future
One reason I've put so much effort into Perl 6 is that I had the feeling that in the long run (like on the scale of 10, 20 years) Perl 5 was a dead end, partially because it is not very well suited for static analysis and optimizations. After the switch to a time-based release cycle, (perceived) faster core development the p5-MOP effort, I am more hopeful.
Coming from a Perl 6 perspective, one thing I'd really love to see in Perl 5 is representation polymorphism. Which means largely decoupling the type of an object from its storage. For example you can write
and have objects of type MyStruct stored as a C struct, which you can pass to a C routine that expects this data structure:
(I should add that this is working code from a test of a module that allows calling C functions, not some future vision).
Having such a mechanism in Perl 5 would be really awesome. The current SV could be one representation, other representations could be optimized for compact storage of objects. Combine that with the works of a custom Meta Object Protocol in p5 core, and you have a winning condition.
It would make interfacing C and other languages much easier, be very memory efficient for objects with specialized storage. Imagine writing a GTK application, and exposing all GTK objects as native Perl objects through a GObject representation (or whatever it's called these days) -- a simple, cheap mapping step is enough, no need for wrapping everything in custom Perl objects.
(My keyboard stopped responding after I've written a good deal of this post, but I planned to write more. So I submitted before I wrote the "Planning for the Future" headline to save a copy before restarting my X server (because submitting only required the mouse, which continued working), and added the rest after it.)