in reply to Re: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
in thread Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges

The third thing to do with LLVM, (which is what I think BrowserUK is advocating, but I may be wrong), is the wholesale replacement of perl's current runtime, making perl's parser convert the op tree to LLVM IR "bytecode", and thus making a perl a first-class Perl-to-LLVM compiler, in the same way that clang is a first-class C-to-LLVM compiler. This would be a massive undertaking, involving understanding all the subtleties and intricacies of 25,000 lines of pp_* functions, and writing code that will emit equivalent IR - even assuming that you kept the rest of perl the same (i.e. still used SVs, HVs, the context stack, pads etc).

You're about 1/3rd of the way to what I was trying to suggest as a possibility. I'm going to try again. I hope you have the patience to read it. I'm going to start with an unrealistic scenario for simplicity and try to fill in the gaps later.

Starting with a syntactically correct perl source that is entirely self-contained -- uses no modules or pragmas; no evals; no runtime code generation of any kind -- there are (notionally, no matter how hard it is linguistically to describe them; or practically to separate them ), three parts involved in the running of that program:

  1. The parsing of the source code and construction of the perl internal form -- call it a tree or graph; bytecode or opcodes -- for want of a term and some short-hand, the Perl Code Graph (PCG).

    Part 1 cannot be changed. it *is* Perl. So segregate it (I know; I know) out into a separately compiled and linked, native code unit.

    A dll (loaded by the minimal perl.exe much as perl5.x.dll is today), that reads the source file and builds exactly whatever it builds now, and then gets the hell out of dodge, leaving the PCG behind in memory.

  2. The interpreter proper -- the runloop -- that processes the PCG and dispatches to the Perl runtime (PRT).

    Moved below, because I need you to understand the context above and below, before the description of this middle bit will make sense.

  3. The Perl runtime -- the functions that do the actual work.

    Part 3 is very hard to re-code, as much of the behavioral semantics of perl is encapsulated entirely within it.

    So, give the whole kit&caboodle -- all the pp_* source code and dependencies -- to LLVM using its C front end, to process into LLVM intermediate form (IF), and then pass that through the various IF optimising stages until it can do no more, and then write it in its optimised IF form to a file (PRT.bc).

    This process is done once (for each release) by "the devs". The optimised PRT.bc file is platform independant and can be distributed as part of the build -- at the risk of the hackles it will raise including mine -- a bit like MSCRT.dll, but platform independent.

    This single binary file contains all the 'dispatched to' functions and their dependencies, pre-optimised as far as that can go, but still in portable IF form.

Part 2. The only new code that needs to be written. But even this already exists in the form of -MO-Deparse.

New code is adapted from Deparse. It processes the PCG in the normal way, but instead of (re)generating the Perl source code, it generates a LLVM IF "copy" of the PCG it is given. Let's call that the LLCG.

The LLCG is now the program we started with, but in a platform independent, optimisible (also platform independent) form that can be

I hope that is clearer than my previous attempts at description.

I'm fully aware that perl frequently reinvokes the parser and runloop in the process of compiling the source of a program, in order to deal with used modules and pragmas and BEGIN/CHECK/UNITCHECK/INIT/END blocks. Effectively, each alternation or recursion would be processed the same way as the above standalone program. If the module has previously been save in .bc form, the parsing and PCG->LLCG conversion can be skipped.

The first step, and perhaps the hardest part getting started, would be the re-engineering the existing build process -- and a little tweaking of the source files -- to break apart the code needed for parts 1 & 2 from part 3, so they can be built into separate dlls -- and the latter into PRT.bc. This process may result in some duplication as perl tends to use internally some of the same stuff that it provides to Perl programs as runtime.

These modifications to the build process and splitting out of the parser/PCG generation from the runtime could be done and used by the next release (or the one after that) of the existing Perl distribution. without compromising it.

It would not be trivial and it would require some one with excellent knowledge of both the internals and the build process -- ie. YOU! -- but it wouldn't be a huge job, and it needn't be a throwaway if all the rest failed or went nowhere. It might even benefit the existing code base and build system in its own right.

I'm done. If that fails to clarify or persuade, so be it. I'll respond to direct questions should there be any, but no more attempts to change anyones mind :)

In the unlikely event you read to here, thank you for your time and courtesy.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong