Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

by davido (Cardinal)
on Aug 27, 2012 at 18:14 UTC ( [id://990033]=note: print w/replies, xml ) Need Help??


in reply to Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

The link you posted and quoted from is interesting, but it seems like a lead that has gone cold. Way back in 2009, Nicolas Clark already commented on the concept, here: http://www.nntp.perl.org/group/perl.perl5.porters/2009/04/msg145436.html, where he explained that it might be an interesting project for an intern, but would probably require too much of a time commitment from the intern's mentor (paraphrasing).

But the reason I say it's gone cold is because the github repo is gone, and the original blog author's own URL at the bottom of the page is now dead. Without code, you're starting over again. But more importantly, it would be good to discuss with the original author why he believes it fizzled out. Maybe his wife had a baby and he got too busy to work on it. May be he didn't have enough experience or understanding. Maybe he wasn't able to show enough of a benefit to convince others to join the project and share the enthusiasm. Or maybe it was just a dead end. But it would be worth finding out before investing too much time starting over again.


Dave

Replies are listed 'Best First'.
Re^2: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by Will_the_Chill (Pilgrim) on Aug 27, 2012 at 21:20 UTC
    Dave,

    Please see my reply to the main thread for info directly from Yuval, the author of the Perl5-to-LLVM article from 2009.

    Long story short, we're still seriously considering this as a viable development path.

    Thanks,
    ~ Will
Re^2: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by sundialsvc4 (Abbot) on Aug 27, 2012 at 20:25 UTC

    A particular quote from that article that you cited:

    Furthermore, IMHO most apps are not even bound by those limitations, but rather by programmer laziness/sloppiness.   If better performance is a low hanging fruit that remains unpicked, JIT is really not going to make much of a difference.   Even if it runs 2x faster, it doesn’t mean it'll be 2x cheaper to run. This is the “you are not google” axiom of scalability.   Even if people want this performance, for most of them it won’t actually help their business.

    I remain unconvinced that an “optimizing compiler” would actually deliver useful results as expected, i.e. enough to justify the project.   And here’s why...   Consider a program which did 100,000 hash-table insertions followed by 10 million lookups.   How much of the time will be spent in the Perl-guts, versus the amount of time spent getting to those guts?   I suspect that the performance of this hypothetical application would be determined almost exclusively by the performance of the hash-table code within the “guts,” such that time spent generating the parse-trees and then iterating through them would be negligible.   If an edge-case situation were proffered as “justification” for this effort, I would feel obliged as a program manager to require proof that the edge-case was usefully large ... and that there existed a “project in great pain” with a substantial existing source-code base in Perl.

    Otherwise, it would be an academic exercise.   “See?   I did it!   (But so what?)”

      I suspect that the performance of this hypothetical application would be determined almost exclusively by the performance of the hash-table code within the “guts,” such that time spent generating the parse-trees and then iterating through them would be negligible.

      And I suspect that, once again, you haven't a clue what you are talking about. Have you ever bothered to look into hv.c?

      Each Perl "opcode" has to deal with a complex variety of different possibilities.

      • Is the (hash) tied?
      • Are the hash keys unicode?
      • Does it have shared keys?
      • Is it a stash?
      • Is it a glob?
      • Does it have magic attached?
      • More ...

      With runtime compilation (or jit), it would be possible for 'simple hash' accesses/inserts/updates to bypass all of the myriad checks and balances that are required for the general case, which could yield significant gains in hash heavy code. Ditto for arrays. Ditto for strings. Ditto for numbers. (Do a super search for "use integer" to see some of the possibilities that can yeild.)

      Then there is the simple fact that perl's subroutines/methods are -- even by interpreter standards -- very slow. (See: 488791 for a few salient facts about Perl's subcall performance.)

      Much of this stems from the fact that the way the perl sources are structured, C compilers cannot easily optimise across compilation unit boundaries, because they mostly(*) do compile-time optimisations. However, there are a whole class of optimisations that can be done at either link-time or runtime, that would hugely benefit Perl code.

      (*)MS compiler have the ability to do some link-time optimisations, and it would surprise me greatly if gcc doesn't have similar features. It would also surprise me if these have ever been enabled for teh compilation of Perl. They would need to be specifically tested on so many platforms, it would be very hard to do.

      But, something like LLVM, can do link-time & runtime optimisations, because it (can) targets not specific processors, but a virtual processor (a "VM") which allows its optimiser to operate in that virtual environment. And only once the VM code has been optimised is it finally translated into processor specific machine code.That means you only need to test each optimiation (to the VM) once; and independently, the translation to each processor.

      Not the combinatorial product of all optimisations on all processors.

      What would these gains be worth? It is very hard to say, but if it gave 50% of the difference between (interpreted, non-JITed Java & perl running an recursive algorithm (Ackermann), that does a few simple additions (11 million times):

      So 83.6 seconds for Perl, and 1.031 seconds for Java!

      Perl's productivity and (1/2) Java's performance!

      That would be something worth having for vast array of genomists, physicists, data miners et al.

      Heck. It might even mean that hacks like mod_perl might become redundant; making a whole bunch of web monkeys happy. Moose might even become usable for interactive applications. Parse::RecDescent might be able to process document in real time rather than geological time. DateTime might be able to calculate deltas as they happen rather than historically.

      There are three fundamental limitations on an interpreters performance:

      1. Subroutine/method call performance.
      2. Memory allocation/deallocation performance.
      3. Parameter passing performance.

      Whilst Perl is faster than (native code) Python & Ruby, it sucks badly when compared to Java, LUA etc. And the reasons are:

      1. Perl's particular brand of preprocessor macro-based, Virtual Machine was innovative and way ahead of its time when it was first written.

        But many different hands have been lain upon that tiller in the interim, without (in many cases) understanding the virtues of simplicity that it had for in-lining, and the optimisations that came from that.

        The result is that now, many of the macros are so complex, and so heavily nested and asserted, that the best compiler in the world cannot optimise the twisted morass of code that results, from what looks like a few simple lines, prior to the preprocessor doing its thing.

        The result is that each line of the perl source is so heavily macroised, that very few people realise that the five or six lines of code that are required to construct a minimal subroutine at the perl source level, expand to dozens -- even hundreds of lines once the preprocessor has run. And that those expanded lines are so heavily blocked and nested, that they blow the optimiser stack of pretty much every C compiler available.

      2. Perl's current memory allocator has so many layers to it, that it is neigh impossible to switch in something modern, tried and tested, like the Bohiem allocaotor.

        Much less enable the recognition that the future is 64-bit, and utilise the 8TB of virtual address space to allow each class (of fundamental and user) object to use a dedicated array of exact-sized objects with a simple free list chain.

      3. It eshews the hardware optimise push/pop of parameters to the processor (c) stack (1 opcode each) in favour of (literally) hundreds of opcodes required by each push and pop of its software emulation of those hardware instructions using multiple heap-based stacks.

        Parrot suffers from a similar problem. Someone decided that rather than use the hardware optimised (and constantly re-optimised with each new generation) hardware stack for parameter passing, it was a good idea to emulate the (failed) hardware-based, register-renaming architecture of (the now almost obsolete) RISC processors, in software.

      In a nutshell, your "suspicions" are so out of touch with reality, and so founded upon little more than supposition, that they are valueless.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        You're half right. hv.c does demonstrate one of the big problems in optimizing Perl 5, but no amount of static optimization fairy dust magic will help.

        The biggest problem with regard to SVs and unoptimizability is that the responsibility for determining how to access what's in the SV is in every op. That's why every op that does a read has to check for read magic and every op that does a write has to check for write magic. That's why ops are fat in Perl 5. (That's why ops are fat in Parrot, which still has a design that would make it a much faster VM for Perl 5.6.)

        Moving magic into SVs from ops would help, as would porting the Perl 5 VM to C++ which can optimize for the type of static dispatch this would enable.

        LLVM would only really help if you could compile all of Perl 5 and the XS you want for any given program to LLVM IR and let LLVM optimize across the whole program there, but even then you still have to move magic into the SVs themselves (or spend a lot of time and memory tracing types and program flow at runtime) to be able to optimize down to a handful of processor ops.

        I suspect no one's going to do that for a 10% performance improvement at the cost of 10x memory use.

        With that said, I must disagree with:

        Perl's particular brand of preprocessor macro-based, Virtual Machine was innovative and way ahead of its time when it was first written.

        Not if you look at a good Forth implementation or a decent Smalltalk implementation, both of which you could find back in 1993.

        Perl's current memory allocator has so many layers to it, that it is neigh impossible to switch in something modern, tried and tested, like the Bohiem allocaotor.

        I don't see how Boehm would help. The two biggest memory problems I've measured are that everything must be an SV (even a simple value like an integer) and that there's no sense of heap versus stack allocation. Yes, there's the TARG optimization, and that helps a lot, but if you want an order of magnitude speed improvement, you have to avoid allocating memory where you don't need it.

        Someone decided that rather than use the hardware optimised (and constantly re-optimised with each new generation) hardware stack fr parameter passing, it was a good idea to emulate the (failed) hardware-based, register-renaming architecture of (the now almost obsolete) RISC processors, in software.

        You're overlooking two things. First, you can't do anything interesting with continuations if you're tied to the hardware stack. Second, several research papers have shown that a good implementation of a register machine (I know the Dis VM for Inferno has a great paper on this, and the Lua 5.0 implementation paper has a small discussion) is faster than the equivalent stack machine. I think there's a paper somewhere about a variant of the JVM which saw a measurable speed improvement by going to a register machine too. (Found it in the bibliography of the Lua 5.0 paper: B. Davis, A. Beatty, K. Casey, D. Gregg, and J. Waldron. The case for virtual register machines.)

        ... but with all that said, Parrot's lousy calling convention system is not a good example of a register machine. A good register machine lets you go faster by avoiding moving memory around. Parrot's calling conventions move way too much memory around to go fast.

        Much of this stems from the fact that the way the perl sources are structured, C compilers cannot easily optimise across compilation unit boundaries, because they mostly(*) do compile-time optimisations. However, there are a whole class of optimisations that can be done at either link-time or runtime, that would hugely benefit Perl code.

        (*)MS compiler have the ability to do some link-time optimisations, and it would surprise me greatly if gcc doesn't have similar features. It would also surprise me if these have ever been enabled for teh compilation of Perl. They would need to be specifically tested on so many platforms, it would be very hard to do.

        But, something like LLVM, can do link-time & runtime optimisations, because it (can) targets not specific processors, but a virtual processor (a "VM") which allows its optimiser to operate in that virtual environment. And only once the VM code has been optimised is it finally translated into processor specific machine code.That means you only need to test each optimiation (to the VM) once; and independently, the translation to each processor.

        64 bit VC builds have been LTCG from basically day 1 http://perl5.git.perl.org/perl.git/commit/d921a5fbd57e5a5e78de0c6f237dd9ef3d71323c?f=win32/Makefile. A couple months ago I compiled a Perl with ltcg in 32 bit mode, unfortunately I dont remember the VC version, whether it was my 2003 or my 2008. The DLL got slightly (I dont remember how many KB) fatter from inlining, but the inlined functions still existed as separate function calls, and the assembly looked the same everywhere, and I didn't find anything (looking around randomly by hand) that got a non-standard calling convention except what already was static functions. I wrote it off as useless. 2003 vs 2008 for 32bit code might make all the difference though. I decided it wasn't worth writing a patch up for and submitting to P5P to change the makefile.

        With Will's LLVM proposal, I believe nothing will come of it unless some or all of the pp_ opcode functions, along with runops are rewritten in "not C", or perl opcodes are statically analyzed and converted to native machine data types with SVs being gone. All the "inter procedure optimizations" mentioned in this thread are gone the moment you create a function pointer, it is simply the rules of C and C's ABI on that OS http://msdn.microsoft.com/en-us/library/xbf3tbeh%28v=vs.80%29.aspx.

        I went searching through perl's pre and post preprocessor headers. I found some interesting things which prove that automatic IPO, on Perl, in C with any compiler is simply impossible.
        /* Enable variables which are pointers to functions */ typedef void (*peep_t)(pTHX_ OP* o); typedef regexp* (*regcomp_t) (pTHX_ char* exp, char* xend, PMOP* pm); typedef I32 (*regexec_t) (pTHX_ regexp* prog, char* stringarg, char* strend, char* strbeg, I32 minend, SV* screamer, void* data, U32 flags); typedef char* (*re_intuit_start_t) (pTHX_ regexp *prog, SV *sv, char *strpos, char *strend, U32 flags, re_scream_pos_data *d); typedef SV* (*re_intuit_string_t) (pTHX_ regexp *prog); typedef void (*regfree_t) (pTHX_ struct regexp* r); typedef regexp* (*regdupe_t) (pTHX_ const regexp* r, CLONE_PARAMS *par +am); typedef I32 (*re_fold_t)(const char *, char const *, I32); typedef void (*DESTRUCTORFUNC_NOCONTEXT_t) (void*); typedef void (*DESTRUCTORFUNC_t) (pTHX_ void*); typedef void (*SVFUNC_t) (pTHX_ SV* const); typedef I32 (*SVCOMPARE_t) (pTHX_ SV* const, SV* const); typedef void (*XSINIT_t) (pTHX); typedef void (*ATEXIT_t) (pTHX_ void*); typedef void (*XSUBADDR_t) (pTHX_ CV *); typedef OP* (*Perl_ppaddr_t)(pTHX); typedef OP* (*Perl_check_t) (pTHX_ OP*); typedef void(*Perl_ophook_t)(pTHX_ OP*); typedef int (*Perl_keyword_plugin_t)(pTHX_ char*, STRLEN, OP**); typedef void(*Perl_cpeep_t)(pTHX_ OP *, OP *); typedef void(*globhook_t)(pTHX); ////////////////////////////////////////// /* dummy variables that hold pointers to both runops functions, thus f +orcing * them *both* to get linked in (useful for Peek.xs, debugging etc) */ EXTCONST runops_proc_t PL_runops_std INIT(Perl_runops_standard); EXTCONST runops_proc_t PL_runops_dbg INIT(Perl_runops_debug); //////////////////////////////////////////// START_EXTERN_C #ifdef PERL_GLOBAL_STRUCT_INIT # define PERL_PPADDR_INITED static const Perl_ppaddr_t Gppaddr[] #else # ifndef PERL_GLOBAL_STRUCT # define PERL_PPADDR_INITED EXT Perl_ppaddr_t PL_ppaddr[] /* or perlvars.h */ # endif #endif /* PERL_GLOBAL_STRUCT */ #if (defined(DOINIT) && !defined(PERL_GLOBAL_STRUCT)) || defined(PERL_ +GLOBAL_STRUCT_INIT) # define PERL_PPADDR_INITED = { Perl_pp_null, Perl_pp_stub, Perl_pp_scalar, /* implemented by Perl_pp_null */ Perl_pp_pushmark, Perl_pp_wantarray, Perl_pp_const, ////////////////////////////////////////////
        Now in C++, in theory, calling conventions don't exist unless you explicitly force one. The compiler is free to choose how it wants to implement vtables/etc. MS's Visual C for static C functions does do some pretty good "random" calling conventions for 32bit X86 IMHO. For 64 bit X86, Visual C never deviated from the 1 and only calling convention. The question is, are there any compilers daring enough to create a whole DLL/SO which contains exactly 1 function call in C?

        Not any professional compiler. On some OSes (x64 windows), ABI is enforced through OS parsing of assembly code (x64 MS SEH, technically not true, if you are careful, the OS will never have a reason to parse your ASM). And on some CPUs (SPARC) calling conventions are enforced in hardware.

        Another danger, there is a fine line between inlining/loop unrolling, and making your L1 and L2 Caches useless. Blindly inlining away all function calls will cause a multi MB object file per Perl script that won't solve anything.

        It is the fundamental nature of a programming language like Perl that the opcodes can be presented with many different situations, as you described.   And it knows how to deal with them, so that the programmer does not have to.

        In order to avoid having the interpreter have to do all of these things, you must introduce strong-typing into the language.   Which Perl emphatically does not have.   You must restrict the type of parameters that can be passed into a given subroutine, so that the compiler can make the correct determination(s), statically.   You must also be able to prove that the operation of the compiler and therefore of the generated code is correct:   that your statically-determined checks are both complete and correct; that no other program behavior is possible.

        I argue that the Perl language does not possess the necessary semantics, and it was purposely designed not to require them.   As a language, it is a product of its intended implementation-method; of DWIM and all of that.   And I argue that these characteristics impose that implementation method at the exclusion of all others.

        If you want strong typing, use any one of many languages that provide it.   Those languages provide the semantic detail that your compiler will require.   Without them, you will find that you can’t do it.   The Perl language does not possess them and it never did.   And I think that this is what the professor was saying, when he said it would be a good project for an intern where you could always stop at any time and say you won.

        As I have politely said before, each of us have different core competencies, and language/compiler/interpreters happen to be one of mine.

      I've got the significant edge-case Perl code base, and I am the one "in great pain", thus my personal interest in running down this particular dream.
        What exactly is your edge case? Maybe there is a better way to solve your problem, like fixing a particular poorly performing opcode in perl, or a couple XS functions, than taking the most difficult choice.

        And what are the resources you can supply for your idea (time, $, PhDs, patents, etc)?

        If you are going to do it yourself, just do it. You dont need anyones permission. Perl is open source. Once you have a working prototype with before and after benchmarks, you will start to attract volunteers and corporate users and the rest is history. Not to mention Perl does have hooks for optimizers and "custom" opcodes. You can always JIT an optimizable Perl sub into what perl thinks is XSUB, just change the right fields in the CV struct.

        If I were you, I would hire a couple programmers with knowledge in compiler design/HLA/interpreter VMs, select a couple Perl subs no more than 1 screenful each off of CPAN, the selection doesn't need to be random, there can be a bias, but it must have enough ";"s, have the programmers try and see if they can compile the Perl subs from A. Human Perl or, B. Bytecode Perl to one or more popular bytecode interpreters, Web Grade Javascript, LLVM, .NET CIL, Java bytecode, really anything. Measure the benchmarks of the before and after. If its faster, great, publish it, you don't need anyones permission. If its slower, try a different VM target. Your real work is how to convert Perl to something closer to the hardware. Anyone can writing Perl interpreter that targets any other programing language that is Turing complete but speed would be much worse than today. I dont know what perlito's current speed is against perl. Someone should research that.

        Maybe the question you really want to ask is, not whether Perl 5 can use LLVM, but what is the future of LLVM by itself?

        I will guess the future is excellent, it is OSX's primary compiler. Apple seems to be the primary sponsor and financier of LLVM. It is not going away anytime soon.

        update: there are other people who have ideas on JIT to machine code in perl http://www.nntp.perl.org/group/perl.xs/2012/07/msg2709.html

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://990033]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-03-28 10:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found