|Just another Perl shrine|
Re: Perl 5 Compiler, Again!by chromatic (Archbishop)
|on Aug 13, 2012 at 18:23 UTC||Need Help??|
I believe Perl compiles the script into bytecode...
It's a tree, or rather a graph. It's complicated. It has a lot of references to other things. If you want to execute a serialized version of this tree, you have to deserialize it; you can't merely begin to execute it linearly at a designated entry point.
At that final point before exiting, is the bytecode tree different for every operating system? Every Perl version?
I'm not sure how to answer that. The tree is different in memory because the C structures are different in memory; a 32-bit Perl 5 has different memory representation from a 64-bit Perl 5. The layout of the structures (and the members of the structures) and the types of ops available for the tree differ between major releases of Perl 5.
Could a subset of the Perl language be defined to allow the bytecode to be system independent?
Perhaps, but you'd have to define a bytecode format independent of this tree for that to work. Then you'd have to change how the Perl 5 parser constructs the tree to construct bytecode instead. Then you'd have to change how Perl 5 loads programs to load bytecode instead. You'd either change how Perl 5 executes code to operate on bytecode instead (huge job) or deserialize the bytecode into the tree (also a huge job).
You get to pick whether you spend a lot of time changing the entire execution model of the Perl 5 VM or spend a lot of time starting up all programs which use a serialized tree or some bytecode format because you have to deserialize it into the optree anyway.
(The benchmarks I've seen show it's faster to parse Perl 5 source code again than to deserialize a serialized optree.)
... converting Perl to C...
Whoa, "converting Perl to C" is a completely different thing altogether. You're not going to have a magical transliterator which turns:
... for any program more complex than that example, unless you allow yourself to link against a libperl5.so and emit one line of C code for every node in the tree, then compile that whole mess into an enormous binary. In that case, you're still no "converting Perl to C"; all you're doing is replacing the Perl 5 runloop (itself maybe six lines of C code) with generated C code to do the dispatch for you.
With a really, really good optimizing C compiler, it's possible that you'll see some benefit, but my guess is that without some very clever emitter work, the transliterator will emit compilation units so large that no C compiler can compile them into anything efficient at all.
People don't do this because it's a huge amount of work with a fair amount of risk that won't pay off for several years. It has a good chance of making things worse in the short term, and the handful of people who know how to do it aren't all that interested in making it happen.