Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

by chromatic (Archbishop)
on Aug 28, 2012 at 00:00 UTC ( #990093=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
in thread Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

You're half right. hv.c does demonstrate one of the big problems in optimizing Perl 5, but no amount of static optimization fairy dust magic will help.

The biggest problem with regard to SVs and unoptimizability is that the responsibility for determining how to access what's in the SV is in every op. That's why every op that does a read has to check for read magic and every op that does a write has to check for write magic. That's why ops are fat in Perl 5. (That's why ops are fat in Parrot, which still has a design that would make it a much faster VM for Perl 5.6.)

Moving magic into SVs from ops would help, as would porting the Perl 5 VM to C++ which can optimize for the type of static dispatch this would enable.

LLVM would only really help if you could compile all of Perl 5 and the XS you want for any given program to LLVM IR and let LLVM optimize across the whole program there, but even then you still have to move magic into the SVs themselves (or spend a lot of time and memory tracing types and program flow at runtime) to be able to optimize down to a handful of processor ops.

I suspect no one's going to do that for a 10% performance improvement at the cost of 10x memory use.

With that said, I must disagree with:

Perl's particular brand of preprocessor macro-based, Virtual Machine was innovative and way ahead of its time when it was first written.

Not if you look at a good Forth implementation or a decent Smalltalk implementation, both of which you could find back in 1993.

Perl's current memory allocator has so many layers to it, that it is neigh impossible to switch in something modern, tried and tested, like the Bohiem allocaotor.

I don't see how Boehm would help. The two biggest memory problems I've measured are that everything must be an SV (even a simple value like an integer) and that there's no sense of heap versus stack allocation. Yes, there's the TARG optimization, and that helps a lot, but if you want an order of magnitude speed improvement, you have to avoid allocating memory where you don't need it.

Someone decided that rather than use the hardware optimised (and constantly re-optimised with each new generation) hardware stack fr parameter passing, it was a good idea to emulate the (failed) hardware-based, register-renaming architecture of (the now almost obsolete) RISC processors, in software.

You're overlooking two things. First, you can't do anything interesting with continuations if you're tied to the hardware stack. Second, several research papers have shown that a good implementation of a register machine (I know the Dis VM for Inferno has a great paper on this, and the Lua 5.0 implementation paper has a small discussion) is faster than the equivalent stack machine. I think there's a paper somewhere about a variant of the JVM which saw a measurable speed improvement by going to a register machine too. (Found it in the bibliography of the Lua 5.0 paper: B. Davis, A. Beatty, K. Casey, D. Gregg, and J. Waldron. The case for virtual register machines.)

... but with all that said, Parrot's lousy calling convention system is not a good example of a register machine. A good register machine lets you go faster by avoiding moving memory around. Parrot's calling conventions move way too much memory around to go fast.


Comment on Re^4: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
Re^5: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by BrowserUk (Pope) on Aug 28, 2012 at 07:46 UTC

    Up front. When you nay-say the OPs discussion, I, like many others I suspect, read each sentence twice, consider it thrice, and then stay stum. You have the knowledge and experience to contribute to the OPs endeavors, even when you do so with negative energy. You can save the OP from many blind alleys.

    When sundial "contributes" his 'stop energy'(skip directly to 33:24) there is no knowledge, no experience, nothing but the negative energy of his groundless suppositions.

    LLVM would only really help if you could compile all of Perl 5 and the XS you want for any given program to LLVM IR and let LLVM optimize across the whole program there, but even then you still have to move magic into the SVs themselves (or spend a lot of time and memory tracing types and program flow at runtime) to be able to optimize down to a handful of processor ops.

    Are you 100% sure there would be no gains?

    Just for a minute suspend your disbelief and imagine that all of perl5.x.x.dll/.so was compiled (otherwise unmodified wherever possible) to LLVMs IF. And then when that .dll/.so is linked, all the macros have been expanded and in-lined, all the do{...}while(1) blocks are in-situ; all the external dependencies of all the compile-time scopes are available.

    Are you 100% certain that under those circumstances, that the link-time optimiser isn't going to find substantial gains from its supra compile-unit view of that code?

    Now suspend your disbelief a little further and imagine that somone had the energy and time to use LLVMs amazingly flexible, platform-independent, language-independent type system (it can do 33-bit integers or 91 bit floats if you see the need for them), to re-cast Perl's internal struct-based type inheritance mechanism into a concrete type-inheritance hierarchy.

    What optimisation might it find then?

    C treats structs as opaque lumps of storage, and has no mechanisms for objects, inheritance or any extensions of its storage-based types. But (for example) C++ has these concepts, and as you say:

    porting the Perl 5 VM to C++ which can optimize for the type of static dispatch

    if you could port Perl's type hierarchy to C++, then its compilers should be able to do more by way of optimising them.

    But porting perl to C++ would be a monumental task because it would require re-writing everything to be proper, standards-compliant, C++. Properly OO with all that entails.

    LLVM doesn't impose any particular HLL's view of the world on the code. LL stands for low-level. It doesn't impose any particular type mechanism on the code, it will happily allow you to define a virtual machine (VM) that uses 9-bit words and 3-word registers.

    Isn't it just possible that it might allow the Perl VM to be modeled directly, such that -- with the overview that link-time optimisation has -- it can produce some substantial runtime benefits?

    And just maybe allow the pick-up-sticks nature of the Perl internals to be cleaned up along the way?

    And finally, there is the possibility that its JIT capabilities may be able to recognise (at runtime) when a hash(ref) is 'just a hash', and optimise away all the tests for magic, stashes, globs and other variations, and so fast path critical sections of code at runtime.

    What percentage of Perl's opcode usage actually uses those alternate paths? 10%? 5%? Doesn't that leave a substantial amount of Perl code as potentially JITable to good effect?

    Whether LLVM JIT is up to the task is a different question -- one that would be answered if we could try it.

    Not if you look at a good Forth implementation or a decent Smalltalk implementation, both of which you could find back in 1993.

    I was using Digitalk's SmallTalk/VPM at around that time, and it was dog slow.

    Forth compilers were making strides using their interlaced opcodes technology (called threaded interpreted code back then, but that has different connuctations these days), but a) those interpreters were in large part handed-coded in assembler; b) you had to write your programs in Forth. Like Haskell, its a different mindset, largely out-of-reach of the sysadmins, shell & casual programmers that Perl targeted.

    Defining a language that targets a VM defined in (back then) lightweight C pre-processor macros, and throwing it at the C compilers to optimise, was very innovative.

    The problem is that the many heavy-handed additions, extensions and overzealous "correctness" drives, have turned those once lightweight opcode macros into huge, heavyweight, scope-layered, condition-ridden lumps of unoptimisible boiler-plate. Most of which very few people have ever even taken the time to expand out and look at. Basically, noone really knows what the Perl sources actually look like.

    Too many heavy-hands on the tiller pulling it every which way as the latest greatest fads come and go, have left us with an opaque morass of nearly untouchable code. (That is in no way to belittle the mighty efforts of the current (and past) maintainers; but rather to acknowledge the enormity of their chosen task!)

    I don't see how Boehm would help.

    I'm not sure that it would either, but the main problem is that it would be neigh impossible to try it. Somewhere here (at PM), I documented my attempts to track through the myriad #definiition and redefinitions that make up Perl's memory manager -- it ran to (from memory, literally) hundreds of *alloc/*free names. Impossible to fathom.

    On Windows, as built by default (and AS; and to my knowledge Strawberry), the allocator that gets used can quite easily (using pretty standard perl code), be flipped into a pathological mode where almost every scalar allocation or reallocation results in a page fault. Documented here 4 or 5 two (seemed like longer) years ago, it is still there, despite my posting a 1-line patch to fix it.

    Much of my knowledge of using Perl in a memory-efficient manor has come about simply as a result of finding ways to avoid that pathological behaviour.

    Another big part of the memory problem is the mixing of different allocation sizes within a single heap. Whilst the allocator uses buckets for different sized entities, mixing fixed-sized entities -- scalars, rvs, ints, floats etc. -- and variable sized entities -- strings, AVs etc. -- in the same stack means that you inevitably end up with creeping fragmentation.

    Imagine an allocator that used different heaps for each fixed-sized allocation; and another two heaps for variable-sized allocations that it flip-flops between when it needs to expand the variable-sized heap. Instead of reallocing in-place, it grabs a fresh chunk of VM from the OS and copies the existing strings over to the new heap and discards the old one thereby automatically reclaiming fragmentation.

    Don't argue the case here, I've omitted much detail. But the point is that as-is, it is simply too hard to try bolting a different allocator underneath Perl, because what is there is so intertwined.

    You're overlooking two things. First, you can't do anything interesting with continuations if you're tied to the hardware stack.

    Are continuations a necessary part of a Perl-targeted VM? Or just a theoretically interesting research topic-du-jour.

    From my viewpoint, the fundamental issue with the Parrot VM was and is the notion that it should be all things to all men. Every theoretical nice-to-have and every cool research topic of the day, was to be incorporated in order to support the plethora of languages that were going to magically inter-operate atop it.

    Cool stuff if you have Master's level researchers on research budgets and academia's open-ended time frames to play with. But as a solution to the (original) primary goal of supporting Perl6 ...

    Second, several research papers have shown that a good implementation of a register machine (I know the Dis VM for Inferno has a great paper on this, and the Lua 5.0 implementation paper has a small discussion) is faster than the equivalent stack machine.

    Research papers often have a very particular notion of equivalence.

    Often as not, such comparisons are done using custom interpreters that assume unlimited memory (no garbage collection required), supporting integer-only baby-languages running contrived benchmarks for strictly limited periods on otherwise quiescent machines that are simply switched off when memory starts to exhaust.

    So unrepresentative of running real languages on real workloads on real-world hardware environments, that their notion of equivalence has to be taken very much in the light of the research they are conducting.

    Is there a single, major real-world language that uses continuations?

    Is there a single, real-world, production use VM that emulates a register machine in software?

    Why have RISC architectures failed to take over the world?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      You can save the OP from many blind alleys.

      I don't think he's listening. If this stuff were easy, it would be more than a pipe dream by now.

      I hope no one's taking my criticism as stop energy. My intent is to encourage people to solve the real problems and not pin their hopes on quick and dirty stopgaps that probably won't work.

      Are you 100% certain that under those circumstances, that the link-time optimiser isn't going to find substantial gains from its supra compile-unit view of that code?

      I expect it will find some gains, but keep in mind two things. First, you have to keep around all of the IR for everything you want to optimize across. That includes things like the XS in DBI as well as the Perl 5 core. Second, LLVM tends to expect the languages it compiles have static type systems. The last time I looked at its JIT, it didn't do any sort of tracing at runtime, so either you add that yourself, or you do without. (I stand by the assumption that the best opportunity for optimization from a JIT is rewriting and replacing basic blocks with straight line code that takes advantage of known types.)

      With that said, compiling all of Perl 5 with a compiler that knows how to do link time optimization does offer a benefit, even if you can use LTO only on the core itself. This won't be an order of magnitude improvement. If you get 5% performance improvement, be happy.

      Defining a language that targets a VM defined in (back then) lightweight C pre-processor macros, and throwing it at the C compilers to optimise, was very innovative.

      Maybe so as far as that goes, but the implementation of Perl 5 was, from the start, flawed. Even something as obvious as keeping the reference count in the SV itself has huge problems. See, for example, the way memory pages go unshared really really fast even when reading values between COW processes.

      The problem is that the many heavy-handed additions, extensions and overzealous "correctness" drives, have turned those once lightweight opcode macros into huge, heavyweight, scope-layered, condition-ridden lumps of unoptimisible boiler-plate.

      I think we're talking about different things. Macros or functions are irrelevant to my criticisms of the Perl 5 core design. My biggest objection is the fat opcode design that puts the responsibility for accessing values from SVs in the opcode bodies rather than using some sort of polymorphism (and it doesn't have to be OO!) in the SVs themselves.

      Are continuations a necessary part of a Perl-targeted VM?

      They were a must-have from Perl 6 back then. They simplify a lot of user-visible language constructs, and they make things like resumable exceptions possible. If implemented well, you can get a lot of great features reasonably cheaply from CPS as your control flow mechanism.

      Lua uses them, and Lua uses a register architecture.

      Why have RISC architectures failed to take over the world?

      Windows, I suspect.

        With that said, compiling all of Perl 5 with a compiler that knows how to do link time optimization does offer a benefit, even if you can use LTO only on the core itself. This won't be an order of magnitude improvement. If you get 5% performance improvement, be happy.

        If all you do is use LLVM's C front-end to compile the complete existing code base to IF, optimise, and then back to C to compile, I think you are probably in the ball park, if a little pessimistic. I'd have said low double digit percentage improvements.

        But, if you break out the runtime components from the compile-time components -- ie. everything before the (say*) ${^GLOBALPHASE} = INIT -- and compile the before to C and link it. But then convert the (for want of a better term) "bytecode" to IF and pass it to the LLVM JIT engine, what then?

        And what if the IF generated could be saved and the reloaded thus avoiding the perl compilation phase for second and subsequent runs (until edits)?

        And how about combining the IF generated from the bytecode with the IF form of the core and linking it to build standalone executables?

        Is any of this possible? There is only one way to find out.

        (*)I appreciate that you would probably need to intercept, repeatedly, at the ${^GLOBALPHASE} = 'CHECL' or 'UNITCHECK' stages in reality.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

      where almost every scalar allocation or reallocation results in a page fault. Documented here two (seemed like longer) years ago, it is still there, despite my posting a 1-line patch to fix it.
      We (the perl5 committers) can easily overlook things. The best approach is to create an RT ticket, then if we ignore that, prod us from time to time by updating the ticket by replying to it.

      Dave.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990093]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-08-23 01:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (169 votes), past polls