Re^5: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges

in reply to Re^4: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
in thread Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges

Perhaps if I make my question a bit more specific, it will help clarify things? (I'll explain the perl runtime in detail for anyone following along at home).

Consider the expression $x + $y * 3. This is compiled into an op tree, where (amongst other things) each op struct holds the info needed for that op, but also a pointer (op_next) to the next op in the execution sequence, and a pointer (op_ppaddr) to the C function that knows how to "execute" that op. So the op sequence for the above looks a bit like:

1: OP_PADSV
   op_targ   = 1
   op_ppaddr = Perl_pp_padsv
   op_next   = 2

2: OP_PADSV
   op_targ   = 2
   op_ppaddr = Perl_pp_padsv
   op_next   = 3

3: OP_CONST
   op_sv     = [an SV holding the value 3]
   op_ppaddr = Perl_pp_const
   op_next   = 4

4: OP_MULTIPLY
   op_ppaddr = Perl_pp_multiply
   op_next   = 5

5: OP_ADD
   op_ppaddr = Perl_pp_add
   op_next   = 6
[download]

The pp_functions themselves look a bit like the following (hugely over-simplified of course):

OP * Perl_pp_padsv {
    *PL_stack_sp++ = PL_curpad[PL_op->op_targ];
    return PL_op->op_next;
}
OP * Perl_pp_const {
    *PL_stack_sp++ = PL_op->op_sv;
    return PL_op->op_next;
}
OP * Perl_pp_multiply {
    SV *s1 = *--PL_stack_sp;
    SV *s2 = *--PL_stack_sp;
    SV *s3 = (a new or reused SV of some description);
    SvIVX(s3) = SvIVX(s1) * SvIVX(s2);
    *PL_stack_sp++ = s3;
    return PL_op->op_next;
}
OP * Perl_pp_add {
    SV *s1 = *--PL_stack_sp;
    SV *s2 = *--PL_stack_sp;
    SV *s3 = (a new or reused SV of some description);
    SvIVX(s3) = SvIVX(s1) + SvIVX(s2);
    *PL_stack_sp++ = s3;
    return PL_op->op_next;
}
[download]

And finally, the runops loop looks a bit like:

Perl_runops_standard {
    PL_op = ...;
    while (PL_op) {
        PL_op = PL_op->op_ppaddr();
    }
}
[download]

So the net effect is that perl is in a loop, calling various pp_* functions, whose job is to push and pull useful values onto the perl stack.

Now, under your proposal, you'll all have the pp functions (and all the functions they depend upon, such the hash library) compiled into IR and available to you. What do you do with the op tree? Yuval's proposal is to effectively compile the following C into IR:

    PL_op = Perl_pp_padsv(PL_op);
    PL_op = Perl_pp_padsv(PL_op);
    PL_op = Perl_pp_const(PL_op);
    PL_op = Perl_pp_multiply(PL_op);
    PL_op = Perl_pp_add(PL_op);
[download]

except that he would use modified versions of the pp functions that take and return their args directly, rather than getting them on and off the stack. Then LLVM has access to IR version of the unrolled runops loop and all the functions in IR, and can do its funky stuff with them.

So, with that background, what would your IR generated from the op tree look like? Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different?

Dave

Comment on Re^5: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges Select or Download Code

Replies are listed 'Best First'.
Re^6: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges by BrowserUk (Patriarch) on Aug 31, 2012 at 15:03 UTC
Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different? Bearing in mind that I don't (yet, fully) understand the affects of all the funky flags (eg. vKP/REFC) that Concise uses to (presumably) represent state information and/or state change requirements that is available to the pp_ish functions ... Essentially the first of those 3 options. But I don't understand what you mean by your inclusion of the phrase "for a single sub," in that option. The notion -- in as far as it goes -- is that (starting with the B::Deparse or B::Concise) optree traversal code, that we convert that optree (PCG) into an unrolled runopts loop for the "entire program (fragment)" that has just been compiled and is ready for passing to the runopts loop. Please don't pick over that description, I am aware it is inadequate! I think that your code block "describing" Yuval's proposal, omits a considerable amount of details -- understandably. I think that in order to capture the control flow -- annotated by the Concise output in the form: `?? ... -> 3 ?? ... 3 ...` [download] There would need to be conditions and labels and gotos in the generated IR. Except that those things do not happen in the runloop, but within the pp_* functions themselves with the control flow orchestrated by what they return. So the the question comes: can (or maybe it already is) the format of the PCG be described in LLVM terms, such that LLVM can do the unrolling for us? Maybe all that is needed is to allow LLVM to see the C structs, typedefs, constants etc. that describe the PCG and let it convert them to its bitcode description. Then hand it the PL_op that starts the ball rolling, and it can unwind the loop, by processing all the "inlined" pp_* functions used by this program (fragment) and thus optimise across the entire call graph for each particular program (fragment). Maybe it will need extra hints. I agree with you that simply unrolling the loop -- if that is even possible -- is unlikely to obtain big gains. As is, simply optimising individual pp_* functions. The only possibility of substantial gains is from getting LLVM to consider complete code graphs as single units and look for optimisation across the while kit&caboodle. Maybe that Concise snippet you posted needs to be (manually) translated into (something like): `7: #pp_leave inlined here ... goto END; #pp_enter inlined here ... #pp_nextstate inlined here ... goto 3; #pp_sassign inlined here ... goto 7; 4: #pp_match inlined here ... goto 5; #pp_ex-rv2sv inlined here ... goto 4; 3: #pp_gvsv inlined here ... goto 4; #pp_ex-rv2sv inlined here ... goto 6; 5: #pp_gvsv inlined here ... goto 5; END:` [download] Maybe then it will find lots of unused (by this snippet) code paths that can be trimmed. Maybe it will see the same queries, checks and alerts being performed on the same data multiple times in different expansions. Maybe it will see frequent and expensive indirect accesses to fields in (sub)structures that can be lifted to SSAs. Maybe ... :) Will it find enough to make it worth while? I don't know. But I don't believe that anyone will until we try. All I have been seeking is the least effort approach to enabling that investigation. And seeking to inspire someone with the skills and knowledge -- even if not the time and energy -- to direct the process of enabling it. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^6: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
by BrowserUk (Patriarch) on Aug 31, 2012 at 15:03 UTC

Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different?

Bearing in mind that I don't (yet, fully) understand the affects of all the funky flags (eg. vKP/REFC) that Concise uses to (presumably) represent state information and/or state change requirements that is available to the pp_ish functions ...

Essentially the first of those 3 options. But I don't understand what you mean by your inclusion of the phrase "for a single sub," in that option.

The notion -- in as far as it goes -- is that (starting with the B::Deparse or B::Concise) optree traversal code, that we convert that optree (PCG) into an unrolled runopts loop for the "entire program (fragment)" that has just been compiled and is ready for passing to the runopts loop.

Please don't pick over that description, I am aware it is inadequate!

I *think* that your code block "describing" Yuval's proposal, omits a considerable amount of details -- understandably.

I *think* that in order to capture the control flow -- annotated by the Concise output in the form:

?? ... -> 3
?? ...
3  ...
[download]

There would need to be conditions and labels and gotos in the generated IR. Except that those things do not happen in the runloop, but within the pp_* functions themselves with the control flow orchestrated by what they return.

So the the question comes: can (or maybe it already is) the format of the PCG be described in LLVM terms, such that LLVM can do the unrolling for us?

Maybe all that is needed is to allow LLVM to see the C structs, typedefs, constants etc. that describe the PCG and let it convert them to its bitcode description. Then hand it the PL_op that starts the ball rolling, and it can unwind the loop, by processing all the "inlined" pp_* functions used by this program (fragment) and thus optimise across the entire call graph for each particular program (fragment). Maybe it will need extra hints.

I agree with you that simply unrolling the loop -- if that is even possible -- is unlikely to obtain big gains. As is, simply optimising individual pp_* functions. The only possibility of substantial gains is from getting LLVM to consider complete code graphs as single units and look for optimisation across the while kit&caboodle.

Maybe that Concise snippet you posted needs to be (manually) translated into (something like):

7:
#pp_leave inlined here
...
goto END;
#pp_enter inlined here
...
#pp_nextstate inlined here
...
goto 3;
#pp_sassign inlined here
...
goto 7;
4:
#pp_match inlined here
...
goto 5;
#pp_ex-rv2sv inlined here
...
goto 4;
3:
#pp_gvsv inlined here
...
goto 4;
#pp_ex-rv2sv inlined here
...
goto 6;
5:
#pp_gvsv inlined here
...
goto 5;
END:
[download]

Maybe then it will find lots of unused (by this snippet) code paths that can be trimmed. Maybe it will see the same queries, checks and alerts being performed on the same data multiple times in different expansions. Maybe it will see frequent and expensive indirect accesses to fields in (sub)structures that can be lifted to SSAs. Maybe ... :)

Will it find enough to make it worth while? I don't know. But I don't believe that anyone will until we try.

All I have been seeking is the least effort approach to enabling that investigation. And seeking to inspire someone with the skills and knowledge -- even if not the time and energy -- to direct the process of enabling it.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

[reply]
[d/l]
[select]

In Section Meditations