Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Perl 5 Compiler, Again!

by flexvault (Parson)
on Aug 13, 2012 at 17:39 UTC ( #987170=perlquestion: print w/ replies, xml ) Need Help??
flexvault has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I started the day thinking of other things, and the first thing I read was Perl 5 Optimizing Compiler by Will_the_Chill and I stated my opinion. So instead of doing real work, I thought about this question/problem about a compiler for Perl.

But instead of going after the obfuscation angle, I'd like to ask the Monks that understand the internals of Perl, why this is so elusive. Just to be sure, I did read the references in the discussion, but I still have some fundamental questions:

  • If I type 'perl -cw script.plx' on the command line, I believe Perl compiles the script into bytecode, issues errors if any, and then exits. At that final point before exiting, is the bytecode tree different for every operating system? Every Perl version?
  • If Perl had a parameter to generate and save the bytecode tree, would that tree be able to be executed by Perl?
  • Could a subset of the Perl language be defined to allow the bytecode to be system independent?

The computers today are very fast, and I expect will continue to get faster, so converting Perl to C may or may not be needed (Others can decide/argue that point). I work on many computers today, that whether I execute a 'C' program or a Perl script, I get an immediate prompt, so do we care whether it took 200ms or 300ms.

Whether 'Big Data' needs a Perl to C compiler I don't know, and I think Perl is the best computer language I've found, but I also think that a built-in bytecode 'compiler' would enhance Perl greatly, but I also realize that there may be many valid technical reasons why it can't be.

Thank you...Ed

"Well done is better than well said." - Benjamin Franklin

Comment on Perl 5 Compiler, Again!
Re: Perl 5 Compiler, Again!
by dave_the_m (Parson) on Aug 13, 2012 at 18:18 UTC
    perl doesn't compile into bytecode; it compiles into several complex internal data structures, such as OPs and stashes.

    An attempt was made to emit this as portable bytecode with B::Bytecode, but it was found IIRC, to take longer to load the bytecode and convert back into perl's internal structures, than it did to just recompile the code.

    Dave

Re: Perl 5 Compiler, Again!
by chromatic (Archbishop) on Aug 13, 2012 at 18:23 UTC
    I believe Perl compiles the script into bytecode...

    It's a tree, or rather a graph. It's complicated. It has a lot of references to other things. If you want to execute a serialized version of this tree, you have to deserialize it; you can't merely begin to execute it linearly at a designated entry point.

    At that final point before exiting, is the bytecode tree different for every operating system? Every Perl version?

    I'm not sure how to answer that. The tree is different in memory because the C structures are different in memory; a 32-bit Perl 5 has different memory representation from a 64-bit Perl 5. The layout of the structures (and the members of the structures) and the types of ops available for the tree differ between major releases of Perl 5.

    Could a subset of the Perl language be defined to allow the bytecode to be system independent?

    Perhaps, but you'd have to define a bytecode format independent of this tree for that to work. Then you'd have to change how the Perl 5 parser constructs the tree to construct bytecode instead. Then you'd have to change how Perl 5 loads programs to load bytecode instead. You'd either change how Perl 5 executes code to operate on bytecode instead (huge job) or deserialize the bytecode into the tree (also a huge job).

    You get to pick whether you spend a lot of time changing the entire execution model of the Perl 5 VM or spend a lot of time starting up all programs which use a serialized tree or some bytecode format because you have to deserialize it into the optree anyway.

    (The benchmarks I've seen show it's faster to parse Perl 5 source code again than to deserialize a serialized optree.)

    ... converting Perl to C...

    Whoa, "converting Perl to C" is a completely different thing altogether. You're not going to have a magical transliterator which turns:

    use Perl 5.012; say "Hello, world!"; exit 0;

    Into:

    #include <stdio.h>; int main(void) { printf( "Hello, world!\n" ); exit(0); }

    ... for any program more complex than that example, unless you allow yourself to link against a libperl5.so and emit one line of C code for every node in the tree, then compile that whole mess into an enormous binary. In that case, you're still no "converting Perl to C"; all you're doing is replacing the Perl 5 runloop (itself maybe six lines of C code) with generated C code to do the dispatch for you.

    With a really, really good optimizing C compiler, it's possible that you'll see some benefit, but my guess is that without some very clever emitter work, the transliterator will emit compilation units so large that no C compiler can compile them into anything efficient at all.

    People don't do this because it's a huge amount of work with a fair amount of risk that won't pay off for several years. It has a good chance of making things worse in the short term, and the handful of people who know how to do it aren't all that interested in making it happen.

      ...and the handful of people who know how to do it aren't all that interested in making it happen.

      Au contraire, mon frere!
      Nick Clark has been guiding me in my efforts to create an optimizing Perl 5 compiler.
      Ingy and Reini are both currently working on this and are both coming to visit Austin in the next few weeks to create a solid plan with me.

      chromatic, you may not be interested in making this happen, but others are.
      Thank you for your continued assistance in pointing out my logical and technical errors, it is very useful! :)
        chromatic, you may not be interested in making this happen...

        That's quite a presumption.

Re: Perl 5 Compiler, Again!
by rurban (Scribe) on Aug 13, 2012 at 18:27 UTC
    1. perl -cw script.plx does nothing.

    perlcc -B script.pl will compile to bytecode script.plc,

    perl script.plc will execute the compiled script.

    perl -c just stops after the CHECK phase.

    2. system independent bytecode is hard, because threaded optrees looks different to not-threaded.

    perl5 opcodes change from release to release, there's no discipline. p5 devs do not want to have bytecode discipline.

    parrot was designed to have bytecode discipline originally (and be platform independent), but both of these goals were thrown away without discussion at v1.0. That's why I left the project in protest.

    parrot's platform independent pbc format goal is still a goal in theory, but the tests were also disabled against my will around 1.0, so new bugs were introduced since the tests were disabled. code-smell.

    3. perl had the -u switch to dump its code to a file, which could later be undumped to a single executable. To make this work would need about half a year, but I have no time for it yet.

      ... both of these goals were thrown away without discussion at v1.0.

      Oh but they were discussed, my friend. I'm certainly no fan of the unnecessary breaking of bytecode compatibility willy-nilly with every release that goes on now, but the alternative of freezing PBC at Parrot 1.0 was certainly the worst of all of the options.

      PBC as it exists now is barely sufficient for Parrot, let alone Perl 6 or Perl 5.

        On Parrot:

        I agree that technically freezing the names of ops with 1.0 was a small problem. But the number was low and new names could be easily added.

        There was never a public discussion about this policy change. Allison just went ahead and removed it, without any discussion. With my vehement disagreement, yes. But with no agreement from anybody else.

        If PBC now is not sufficient after the third rewrite what is still missing? When even the tests are disabled, and the packfile examples do not work?

Re: Perl 5 Compiler, Again!
by flexvault (Parson) on Aug 14, 2012 at 12:17 UTC

    Dear Monks,

    Thanks to everyone for giving valid technical reasons for the lack of an integrated Perl 'compiler'. Actually I wrote a 'C' version of 'uxbasic', but abandoned it in favor of Perl. It worked, was fast, but lacked the feature rich capability of Perl. So I wrote an 'uxbasic' to Perl translator script that took many of my products and moved them to Perl. It was 99% correct, and with a minimum of debugging, I had a Perl version of my products that could continue to grow.

    It's a lot more fun working with Perl than either 'C' or 'uxbasic'.

    Thank you...Ed

    "Well done is better than well said." - Benjamin Franklin

      flexvault:

      There is an integrated compiler in perl5. And there are external compilers.

      You will not easily see what the integrated compiler in perl5 does, but you see some result with B::Concise. All the ex-OPs there are optimized out already.

      $ perl -MO=Concise -e'1+1' 3 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v:{ ->3 - <0> ex-const v ->3
      E.g. 1=1 was constant folded here.
      $ perl -MO=Concise -e'$a=1+$a' 8 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v:{ ->3 7 <2> sassign vKS/2 ->8 5 <2> add[t1] sK/2 ->6 3 <$> const(IV 1) s ->4 - <1> ex-rv2sv sK/1 ->5 4 <$> gvsv(*a) s ->5 - <1> ex-rv2sv sKRM*/1 ->7 6 <$> gvsv(*a) s ->7
      All the sv references rv2sv were optimized to direct gvsv accesses.

      perlcc is the backend for 3 external compilers.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://987170]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (16)
As of 2014-07-14 16:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (268 votes), past polls