Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

by chromatic (Archbishop)
on Aug 28, 2012 at 19:00 UTC ( #990308=note: print w/ replies, xml ) Need Help??


in reply to Re: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
in thread Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

Full agreement. None of these are possible in a year without a small army of the right coders, and even then a year's a short time to produce something as reliable as Perl 5.


Comment on Re^2: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
Re^3: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by Will_the_Chill (Pilgrim) on Aug 28, 2012 at 21:07 UTC
    dave_the_m & chromatic,

    I think BrowserUK's 3-phase idea (pasted below) and Yuval's comments are both in the same vein. I think this general Perl5-to-LLVM concept is valid, and although it might very well take more than a year to fully implement, I believe we could have some kind of initial demo ready by YAPC::NA 2013 in Austin.

    dave_the_m, discounting only the amount of work needed to fully implement Perl5-to-LLVM, would you please tell me which part of BrowserUK's 3-phase idea is not producing the "I see how that might work!" epiphany in your mind?

    Thanks,
    ~ Will


    BrowserUK wrote:

    1. There is the bit of perl that parses the source code and builds the AST from the Perl programs source. This would need to be left pretty much as is. The whole perl defines Perl thing. This would need to be compiled (back to) C and then linked into an executable.

    2. Then there is the bit that converts the AST into byte code. I believe that this would need to be separated out and converted to produce IF.

    3. Then there is the bit of perl that ordinarily runs the bytecode. Not just the runloop, but all the code the runloop dispatches to. I think that should be compiled to IF and built into an IF library (.bc) It would be "combined" with the IF from stage 2, at runtime, and given to the JIT to convert to machine code.
      You know, you can cut out the italics and use <blockquote></blockquote> much easier to read
      discounting only the amount of work needed to fully implement Perl5-to-LLVM, would you please tell me which part of BrowserUK's 3-phase idea is not producing the "I see how that might work!" epiphany in your mind?
      Well, since the current perl interpreter has neither an AST, nor bytecode, I don't really understand the proposal.

      But my basic issue in this case is how do you get the 5X speedup rather than the 10%? I understand in great detail exactly what C-level code the perl interpreter executes while running an ops loop (and the ops within it). No-one has yet explained to me in a way I can understand how all the necessary stuff to carry out those actions will somehow go faster.

      For example, to do a subroutine call in perl, you need to do a whole bunch of stuff like

      • set up @_ and copy the stack items to it;
      • set up a context stack entry so that return etc know what to do;
      • create a new scope stack level;
      • bump the reference count of the live CV so that it can't be freed in mid-execution
      • make the appropriate pad the current one;
      • etc.
      All (or at least the vast majority) of this has still to be done, unless the perl interpreter is radically redesigned somehow. At the moment all this is done by a single, tuned C function. I don't see LLVM making it significantly faster.

      Dave

        Dave,

        If the current Perl 5 interpreter has neither an AST nor bytecode representation, then what is the intermediate form created by the parser and executed by the runtime backend?

        I've heard the term "optree", is that it?

        Are we not considering the B::Bytecode system?

        Thanks,
        ~ Will

        When in order to call this:

        void _24to32( SV* packed ) { IS_VARS; char *pp = SvPVX( packed ); _4BY6 up; int i; IS_RESET; for( i=0; i<24; i+=3 ) { up.packed = _byteswap_ulong( *(unsigned long*)&pp[ i ] ); IS_PUSHUV( up.u.a ); IS_PUSHUV( up.u.b ); IS_PUSHUV( up.u.c ); IS_PUSHUV( up.u.d ); } IS_DONE; return; }

        You have to go through this lot:

        void XS_main__24to32(register PerlInterpreter* my_perl , CV* cv); void XS_main__24to32(register PerlInterpreter* my_perl , CV* cv) { extern int Perl___notused ; SV **sp = (*Perl_Istack_sp_ptr(((PerlI +nterpreter *)Perl_get_context()))); I32 ax = (*(*Perl_Imarkstack_ptr_ +ptr(((PerlInterpreter *)Perl_get_context())))--); register SV **mark += (*Perl_Istack_base_ptr(((PerlInterpreter *)Perl_get_context()))) + +ax++; I32 items = (I32)(sp - mark); #line 179 "_24to32.c" if (items != 1) Perl_croak_xs_usage(((PerlInterpreter *)Perl_get_context()), cv +,"packed"); ((void)ax); sp -= items; { SV * packed = (*Perl_Istack_base_ptr(((PerlInterpreter *)Perl_g +et_context())))[ax + (0)]; #line 117 "_24to32.xs" I32* temp; #line 188 "_24to32.c" #line 119 "_24to32.xs" temp = (*Perl_Imarkstack_ptr_ptr(((PerlInterpreter *)Perl_get_cont +ext())))++; _24to32(packed); if ((*Perl_Imarkstack_ptr_ptr(((PerlInterpreter *)Perl_get_context +()))) != temp) { (*Perl_Imarkstack_ptr_ptr(((PerlInterpreter *)Perl_get_context() +))) = temp; do { do { const IV tmpXSoff = (0); (*Perl_Istack_sp_ptr(((PerlIn +terpreter *)Perl_get_context()))) = (*Perl_Istack_base_ptr(((PerlInte +rpreter *)Perl_get_context()))) + ax + (tmpXSoff - 1); return; } whil +e (0); } while (0); } return; #line 199 "_24to32.c" (*Perl_Istack_sp_ptr(((PerlInterpreter *)Perl_get_context()))) = s +p; return; } }

        and this lot:

        void _24to32( SV* packed ) { SV **sp = (*Perl_Istack_sp_ptr(((PerlInterpreter *)Perl_get_contex +t()))); I32 ax = (*(*Perl_Imarkstack_ptr_ptr(((PerlInterpreter *)Perl +_get_context())))--); register SV **mark = (*Perl_Istack_base_ptr(((P +erlInterpreter *)Perl_get_context()))) + ax++; I32 items = (I32)(sp - + mark); char *pp = ((packed)->sv_u.svu_pv); _4BY6 up; int i; sp = mark; for( i=0; i<24; i+=3 ) { up.packed = _byteswap_ulong( *(unsigned long*)&pp[ i ] ); do { do { if ((*Perl_Istack_max_ptr(((PerlInterpreter *)Perl_g +et_context()))) - sp < (int)(1)) { sp = Perl_stack_grow(((PerlInterpr +eter *)Perl_get_context()), sp,sp,(int) (1)); } } while (0); (*++sp = + (Perl_sv_2mortal(((PerlInterpreter *)Perl_get_context()), Perl_newSV +uv(((PerlInterpreter *)Perl_get_context()), up.u.a)))); } while (0); do { do { if ((*Perl_Istack_max_ptr(((PerlInterpreter *)Perl_g +et_context()))) - sp < (int)(1)) { sp = Perl_stack_grow(((PerlInterpr +eter *)Perl_get_context()), sp,sp,(int) (1)); } } while (0); (*++sp = + (Perl_sv_2mortal(((PerlInterpreter *)Perl_get_context()), Perl_newSV +uv(((PerlInterpreter *)Perl_get_context()), up.u.b)))); } while (0); do { do { if ((*Perl_Istack_max_ptr(((PerlInterpreter *)Perl_g +et_context()))) - sp < (int)(1)) { sp = Perl_stack_grow(((PerlInterpr +eter *)Perl_get_context()), sp,sp,(int) (1)); } } while (0); (*++sp = + (Perl_sv_2mortal(((PerlInterpreter *)Perl_get_context()), Perl_newSV +uv(((PerlInterpreter *)Perl_get_context()), up.u.c)))); } while (0); do { do { if ((*Perl_Istack_max_ptr(((PerlInterpreter *)Perl_g +et_context()))) - sp < (int)(1)) { sp = Perl_stack_grow(((PerlInterpr +eter *)Perl_get_context()), sp,sp,(int) (1)); } } while (0); (*++sp = + (Perl_sv_2mortal(((PerlInterpreter *)Perl_get_context()), Perl_newSV +uv(((PerlInterpreter *)Perl_get_context()), up.u.d)))); } while (0); } (*Perl_Istack_sp_ptr(((PerlInterpreter *)Perl_get_context()))) = s +p; return; }

        You really don't see any opportunities for some radical optimisations?

        And remember, that is positively lightweight compared to the unoptimised C code produced for all of Perl's internal opcodes. Existing C compilers may be able to optimise some of that lot away on a function-by-function basis, but how much?

        And now consider the possibilities of allowing the optimiser to look at *all* the internal functions and look for really radical optimisations?

        Consider the possibilities of LTO and whole program analysis to lift whole chunks of that boiler plate above up to the runloop level on a per interpreter basis?

        Then consider the possibilities of JIT noticing that the context doesn't (cannot) change across a whole swath of runtime code and reserving a register, say one of the unused segment registers, for it and using register relative addressing for each interpreter?

        And then consider that LLVM doesn't have to follow C rules. It can invent weird stuff that C compilers (and C programmers) wouldn't even think of -- see my earlier example of it converting (at the IF level) an array of 52 shorts into a single 832-bit integer.

        What might it do with the whole SVt_* type hierarchy? Imagine (for sake of totally speculative example) that it could reduce *all* the SV flag tests & sets to a single, simple bit manipulation at some calculated bit offset into a huge integer. Is that possible? Would it result in substantial savings if used uniformly throughout the code base?

        Does any of this intrigue you? Even if only so that you can say: I told you so?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

        /div/div

      Perl 5 doesn't run slower than C because it lacks a JIT. Perl 5 runs slower than C because it's designed to be more flexible than C and that flexibility is not free. That flexibility includes but is not limited to:

      • reference counting
      • automatic destruction at scope leave
      • flexible typing and coercion
      • eval
      • dynamic dispatch
      • method dispatch
      • overloading
      • autoloading
      • importing
      • pragmas
      • monkeypatching
      • metaprogramming
      • closures
      • symbolic references
      • exceptions
      • dynamic data structures

      Add to that the facts that LLVM was not designed for dynamic languages and that Perl 5 was not designed for static optimizations. Using LLVM from Perl 5 without changing the internals of the Perl 5 VM probably will speed up code that's already written in C but certainly without addressing most of the reasons why Perl 5 code can run slower than C.

        chromatic,

        I guess I'm a bit confused. Above you say:
        Perl 5 doesn't run slower than C because it lacks a JIT.
        But here you say:
        Add a tracing JIT. Iterate on that for a few years.
        So which is it? JIT good or JIT bad?

        ...

        Other than that potential logical inconsistency, I am in general agreement with your premises of Perl-runs-slow-because-it-is-dynamic and LLVM-was-not-designed-for-dynamic-languages. However, until a better backend target is determined, it looks like LLVM may be one of our best options.

        If we can clean up some of the Perl internals along the way to Perl5-on-LLVM, then we can either change the backend target to a more-dynamic-than-LLVM platform or just upgrade LLVM to support dynamic language features.

        Thanks,
        ~ Will

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990308]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-12-25 00:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls