Your skill will accomplish
what the force of many cannot
Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different?
Bearing in mind that I don't (yet, fully) understand the affects of all the funky flags (eg. vKP/REFC) that Concise uses to (presumably) represent state information and/or state change requirements that is available to the pp_ish functions ...
Essentially the first of those 3 options. But I don't understand what you mean by your inclusion of the phrase "for a single sub," in that option.
The notion -- in as far as it goes -- is that (starting with the B::Deparse or B::Concise) optree traversal code, that we convert that optree (PCG) into an unrolled runopts loop for the "entire program (fragment)" that has just been compiled and is ready for passing to the runopts loop.
Please don't pick over that description, I am aware it is inadequate!
I *think* that your code block "describing" Yuval's proposal, omits a considerable amount of details -- understandably.
I *think* that in order to capture the control flow -- annotated by the Concise output in the form:
There would need to be conditions and labels and gotos in the generated IR. Except that those things do not happen in the runloop, but within the pp_* functions themselves with the control flow orchestrated by what they return.
So the the question comes: can (or maybe it already is) the format of the PCG be described in LLVM terms, such that LLVM can do the unrolling for us?
Maybe all that is needed is to allow LLVM to see the C structs, typedefs, constants etc. that describe the PCG and let it convert them to its bitcode description. Then hand it the PL_op that starts the ball rolling, and it can unwind the loop, by processing all the "inlined" pp_* functions used by this program (fragment) and thus optimise across the entire call graph for each particular program (fragment). Maybe it will need extra hints.
I agree with you that simply unrolling the loop -- if that is even possible -- is unlikely to obtain big gains. As is, simply optimising individual pp_* functions. The only possibility of substantial gains is from getting LLVM to consider complete code graphs as single units and look for optimisation across the while kit&caboodle.
Maybe that Concise snippet you posted needs to be (manually) translated into (something like):
Maybe then it will find lots of unused (by this snippet) code paths that can be trimmed. Maybe it will see the same queries, checks and alerts being performed on the same data multiple times in different expansions. Maybe it will see frequent and expensive indirect accesses to fields in (sub)structures that can be lifted to SSAs. Maybe ... :)
Will it find enough to make it worth while? I don't know. But I don't believe that anyone will until we try.
All I have been seeking is the least effort approach to enabling that investigation. And seeking to inspire someone with the skills and knowledge -- even if not the time and energy -- to direct the process of enabling it.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.