http://www.perlmonks.org?node_id=990881


in reply to Re^3: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
in thread Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges

A first reaction (I'm getting punchy at this point in time.). If you use LLVM as just an alternative C compiler, then as a part of the process of compiling perl -- unchanged -- it will compile whatever code/functions/source file(s) that constitute the current "runloop" (ostensibly runops_standard in run.c).

One of the possible variations of using LLVM in this mode, is that it (clang) can output .bc (bitcode) files, that can later be linked together -- using the LLVM linker -- to produce a native executable. What's more, is that the LLVM linker is quite happy to accept some "object files" that are in .bc format, and some that are in the normal .obj/.o format, and link them together and produce a (normal) platform dependent executable.

It is also possible, to have clang produce LLVM IF in text form.

So, in theory, if we ran (something like; there's a lot of documentation) clang --emit-text run.c -O run.o and then inspected run.o in a text editor, it would tell us exactly what the IF looks like for that source file.

And that IF, would (in its binary form), be combinable -- with all the other normal object files produced using gcc or cl.exe -- using the LLVM linker, to produce a working, native compiled executable.

That is not a direct answer to your question, but the point is that (as a starting point), it is possible to build a working executable, by substituting any individual clang-compiled-to-bitcode-source-file, for the native compiled objct file from that source, and combine it with all the other GGC/CL produced object files, and the LLVM linker will happily combine them into into a native executable.

Thus, to see what the LLVM IF look like, for any given source file, you only need to use clang to compile that individual source file to its text representation. You don't gain any performance, but you do get to see what LLVM IF looks like.

I'll attempt to get back to you with a specific answer to your question, but given that my LLVM installation if 2 years old, and my primary perl installation about the same, It'll take a couple of days to get caught up.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

Replies are listed 'Best First'.
Re^5: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
by dave_the_m (Monsignor) on Aug 31, 2012 at 13:36 UTC
    Perhaps if I make my question a bit more specific, it will help clarify things? (I'll explain the perl runtime in detail for anyone following along at home).

    Consider the expression $x + $y * 3. This is compiled into an op tree, where (amongst other things) each op struct holds the info needed for that op, but also a pointer (op_next) to the next op in the execution sequence, and a pointer (op_ppaddr) to the C function that knows how to "execute" that op. So the op sequence for the above looks a bit like:

    1: OP_PADSV op_targ = 1 op_ppaddr = Perl_pp_padsv op_next = 2 2: OP_PADSV op_targ = 2 op_ppaddr = Perl_pp_padsv op_next = 3 3: OP_CONST op_sv = [an SV holding the value 3] op_ppaddr = Perl_pp_const op_next = 4 4: OP_MULTIPLY op_ppaddr = Perl_pp_multiply op_next = 5 5: OP_ADD op_ppaddr = Perl_pp_add op_next = 6

    The pp_functions themselves look a bit like the following (hugely over-simplified of course):

    OP * Perl_pp_padsv { *PL_stack_sp++ = PL_curpad[PL_op->op_targ]; return PL_op->op_next; } OP * Perl_pp_const { *PL_stack_sp++ = PL_op->op_sv; return PL_op->op_next; } OP * Perl_pp_multiply { SV *s1 = *--PL_stack_sp; SV *s2 = *--PL_stack_sp; SV *s3 = (a new or reused SV of some description); SvIVX(s3) = SvIVX(s1) * SvIVX(s2); *PL_stack_sp++ = s3; return PL_op->op_next; } OP * Perl_pp_add { SV *s1 = *--PL_stack_sp; SV *s2 = *--PL_stack_sp; SV *s3 = (a new or reused SV of some description); SvIVX(s3) = SvIVX(s1) + SvIVX(s2); *PL_stack_sp++ = s3; return PL_op->op_next; }

    And finally, the runops loop looks a bit like:

    Perl_runops_standard { PL_op = ...; while (PL_op) { PL_op = PL_op->op_ppaddr(); } }

    So the net effect is that perl is in a loop, calling various pp_* functions, whose job is to push and pull useful values onto the perl stack.

    Now, under your proposal, you'll all have the pp functions (and all the functions they depend upon, such the hash library) compiled into IR and available to you. What do you do with the op tree? Yuval's proposal is to effectively compile the following C into IR:

    PL_op = Perl_pp_padsv(PL_op); PL_op = Perl_pp_padsv(PL_op); PL_op = Perl_pp_const(PL_op); PL_op = Perl_pp_multiply(PL_op); PL_op = Perl_pp_add(PL_op);

    except that he would use modified versions of the pp functions that take and return their args directly, rather than getting them on and off the stack. Then LLVM has access to IR version of the unrolled runops loop and all the functions in IR, and can do its funky stuff with them.

    So, with that background, what would your IR generated from the op tree look like? Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different?

    Dave

      Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different?

      Bearing in mind that I don't (yet, fully) understand the affects of all the funky flags (eg. vKP/REFC) that Concise uses to (presumably) represent state information and/or state change requirements that is available to the pp_ish functions ...

      Essentially the first of those 3 options. But I don't understand what you mean by your inclusion of the phrase "for a single sub," in that option.

      The notion -- in as far as it goes -- is that (starting with the B::Deparse or B::Concise) optree traversal code, that we convert that optree (PCG) into an unrolled runopts loop for the "entire program (fragment)" that has just been compiled and is ready for passing to the runopts loop.

      Please don't pick over that description, I am aware it is inadequate!

      I *think* that your code block "describing" Yuval's proposal, omits a considerable amount of details -- understandably.

      I *think* that in order to capture the control flow -- annotated by the Concise output in the form:

      ?? ... -> 3 ?? ... 3 ...

      There would need to be conditions and labels and gotos in the generated IR. Except that those things do not happen in the runloop, but within the pp_* functions themselves with the control flow orchestrated by what they return.

      So the the question comes: can (or maybe it already is) the format of the PCG be described in LLVM terms, such that LLVM can do the unrolling for us?

      Maybe all that is needed is to allow LLVM to see the C structs, typedefs, constants etc. that describe the PCG and let it convert them to its bitcode description. Then hand it the PL_op that starts the ball rolling, and it can unwind the loop, by processing all the "inlined" pp_* functions used by this program (fragment) and thus optimise across the entire call graph for each particular program (fragment). Maybe it will need extra hints.

      I agree with you that simply unrolling the loop -- if that is even possible -- is unlikely to obtain big gains. As is, simply optimising individual pp_* functions. The only possibility of substantial gains is from getting LLVM to consider complete code graphs as single units and look for optimisation across the while kit&caboodle.

      Maybe that Concise snippet you posted needs to be (manually) translated into (something like):

      7: #pp_leave inlined here ... goto END; #pp_enter inlined here ... #pp_nextstate inlined here ... goto 3; #pp_sassign inlined here ... goto 7; 4: #pp_match inlined here ... goto 5; #pp_ex-rv2sv inlined here ... goto 4; 3: #pp_gvsv inlined here ... goto 4; #pp_ex-rv2sv inlined here ... goto 6; 5: #pp_gvsv inlined here ... goto 5; END:

      Maybe then it will find lots of unused (by this snippet) code paths that can be trimmed. Maybe it will see the same queries, checks and alerts being performed on the same data multiple times in different expansions. Maybe it will see frequent and expensive indirect accesses to fields in (sub)structures that can be lifted to SSAs. Maybe ... :)

      Will it find enough to make it worth while? I don't know. But I don't believe that anyone will until we try.

      All I have been seeking is the least effort approach to enabling that investigation. And seeking to inspire someone with the skills and knowledge -- even if not the time and energy -- to direct the process of enabling it.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong