Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges

by dave_the_m (Monsignor)
on Aug 30, 2012 at 22:06 UTC ( [id://990860]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
in thread Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges

(Damn, I said i wouldn't be responding further...)

New code is adapted from Deparse. It processes the PCG in the normal way, but instead of (re)generating the Perl source code, it generates a LLVM IF "copy" of the PCG it is given. Let's call that the LLCG
This is the bit I don't currently get. take a piece of code like
$m = ($s =~ /foo/);

Which is compiled to an optree that looks like:

7 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 p:3) v:{ ->3 6 <2> sassign vKS/2 ->7 4 </> match(/"foo"/) sKPS/RTIME ->5 - <1> ex-rv2sv sK/1 ->4 3 <#> gvsv[*s] s ->4 - <1> ex-rv2sv sKRM*/1 ->6 5 <#> gvsv[*m] s ->6
In general terms, what would the IR look like that you would convert that into?

Dave.

Replies are listed 'Best First'.
Re^4: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
by BrowserUk (Patriarch) on Aug 31, 2012 at 00:00 UTC

    A first reaction (I'm getting punchy at this point in time.). If you use LLVM as just an alternative C compiler, then as a part of the process of compiling perl -- unchanged -- it will compile whatever code/functions/source file(s) that constitute the current "runloop" (ostensibly runops_standard in run.c).

    One of the possible variations of using LLVM in this mode, is that it (clang) can output .bc (bitcode) files, that can later be linked together -- using the LLVM linker -- to produce a native executable. What's more, is that the LLVM linker is quite happy to accept some "object files" that are in .bc format, and some that are in the normal .obj/.o format, and link them together and produce a (normal) platform dependent executable.

    It is also possible, to have clang produce LLVM IF in text form.

    So, in theory, if we ran (something like; there's a lot of documentation) clang --emit-text run.c -O run.o and then inspected run.o in a text editor, it would tell us exactly what the IF looks like for that source file.

    And that IF, would (in its binary form), be combinable -- with all the other normal object files produced using gcc or cl.exe -- using the LLVM linker, to produce a working, native compiled executable.

    That is not a direct answer to your question, but the point is that (as a starting point), it is possible to build a working executable, by substituting any individual clang-compiled-to-bitcode-source-file, for the native compiled objct file from that source, and combine it with all the other GGC/CL produced object files, and the LLVM linker will happily combine them into into a native executable.

    Thus, to see what the LLVM IF look like, for any given source file, you only need to use clang to compile that individual source file to its text representation. You don't gain any performance, but you do get to see what LLVM IF looks like.

    I'll attempt to get back to you with a specific answer to your question, but given that my LLVM installation if 2 years old, and my primary perl installation about the same, It'll take a couple of days to get caught up.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Perhaps if I make my question a bit more specific, it will help clarify things? (I'll explain the perl runtime in detail for anyone following along at home).

      Consider the expression $x + $y * 3. This is compiled into an op tree, where (amongst other things) each op struct holds the info needed for that op, but also a pointer (op_next) to the next op in the execution sequence, and a pointer (op_ppaddr) to the C function that knows how to "execute" that op. So the op sequence for the above looks a bit like:

      1: OP_PADSV op_targ = 1 op_ppaddr = Perl_pp_padsv op_next = 2 2: OP_PADSV op_targ = 2 op_ppaddr = Perl_pp_padsv op_next = 3 3: OP_CONST op_sv = [an SV holding the value 3] op_ppaddr = Perl_pp_const op_next = 4 4: OP_MULTIPLY op_ppaddr = Perl_pp_multiply op_next = 5 5: OP_ADD op_ppaddr = Perl_pp_add op_next = 6

      The pp_functions themselves look a bit like the following (hugely over-simplified of course):

      OP * Perl_pp_padsv { *PL_stack_sp++ = PL_curpad[PL_op->op_targ]; return PL_op->op_next; } OP * Perl_pp_const { *PL_stack_sp++ = PL_op->op_sv; return PL_op->op_next; } OP * Perl_pp_multiply { SV *s1 = *--PL_stack_sp; SV *s2 = *--PL_stack_sp; SV *s3 = (a new or reused SV of some description); SvIVX(s3) = SvIVX(s1) * SvIVX(s2); *PL_stack_sp++ = s3; return PL_op->op_next; } OP * Perl_pp_add { SV *s1 = *--PL_stack_sp; SV *s2 = *--PL_stack_sp; SV *s3 = (a new or reused SV of some description); SvIVX(s3) = SvIVX(s1) + SvIVX(s2); *PL_stack_sp++ = s3; return PL_op->op_next; }

      And finally, the runops loop looks a bit like:

      Perl_runops_standard { PL_op = ...; while (PL_op) { PL_op = PL_op->op_ppaddr(); } }

      So the net effect is that perl is in a loop, calling various pp_* functions, whose job is to push and pull useful values onto the perl stack.

      Now, under your proposal, you'll all have the pp functions (and all the functions they depend upon, such the hash library) compiled into IR and available to you. What do you do with the op tree? Yuval's proposal is to effectively compile the following C into IR:

      PL_op = Perl_pp_padsv(PL_op); PL_op = Perl_pp_padsv(PL_op); PL_op = Perl_pp_const(PL_op); PL_op = Perl_pp_multiply(PL_op); PL_op = Perl_pp_add(PL_op);

      except that he would use modified versions of the pp functions that take and return their args directly, rather than getting them on and off the stack. Then LLVM has access to IR version of the unrolled runops loop and all the functions in IR, and can do its funky stuff with them.

      So, with that background, what would your IR generated from the op tree look like? Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different?

      Dave

        Is it just an unrolled runops loop for a single sub, with lots of explicit calls to pp-ish functions, or would you try and unroll the pp functions themselves, or something completely different?

        Bearing in mind that I don't (yet, fully) understand the affects of all the funky flags (eg. vKP/REFC) that Concise uses to (presumably) represent state information and/or state change requirements that is available to the pp_ish functions ...

        Essentially the first of those 3 options. But I don't understand what you mean by your inclusion of the phrase "for a single sub," in that option.

        The notion -- in as far as it goes -- is that (starting with the B::Deparse or B::Concise) optree traversal code, that we convert that optree (PCG) into an unrolled runopts loop for the "entire program (fragment)" that has just been compiled and is ready for passing to the runopts loop.

        Please don't pick over that description, I am aware it is inadequate!

        I *think* that your code block "describing" Yuval's proposal, omits a considerable amount of details -- understandably.

        I *think* that in order to capture the control flow -- annotated by the Concise output in the form:

        ?? ... -> 3 ?? ... 3 ...

        There would need to be conditions and labels and gotos in the generated IR. Except that those things do not happen in the runloop, but within the pp_* functions themselves with the control flow orchestrated by what they return.

        So the the question comes: can (or maybe it already is) the format of the PCG be described in LLVM terms, such that LLVM can do the unrolling for us?

        Maybe all that is needed is to allow LLVM to see the C structs, typedefs, constants etc. that describe the PCG and let it convert them to its bitcode description. Then hand it the PL_op that starts the ball rolling, and it can unwind the loop, by processing all the "inlined" pp_* functions used by this program (fragment) and thus optimise across the entire call graph for each particular program (fragment). Maybe it will need extra hints.

        I agree with you that simply unrolling the loop -- if that is even possible -- is unlikely to obtain big gains. As is, simply optimising individual pp_* functions. The only possibility of substantial gains is from getting LLVM to consider complete code graphs as single units and look for optimisation across the while kit&caboodle.

        Maybe that Concise snippet you posted needs to be (manually) translated into (something like):

        7: #pp_leave inlined here ... goto END; #pp_enter inlined here ... #pp_nextstate inlined here ... goto 3; #pp_sassign inlined here ... goto 7; 4: #pp_match inlined here ... goto 5; #pp_ex-rv2sv inlined here ... goto 4; 3: #pp_gvsv inlined here ... goto 4; #pp_ex-rv2sv inlined here ... goto 6; 5: #pp_gvsv inlined here ... goto 5; END:

        Maybe then it will find lots of unused (by this snippet) code paths that can be trimmed. Maybe it will see the same queries, checks and alerts being performed on the same data multiple times in different expansions. Maybe it will see frequent and expensive indirect accesses to fields in (sub)structures that can be lifted to SSAs. Maybe ... :)

        Will it find enough to make it worth while? I don't know. But I don't believe that anyone will until we try.

        All I have been seeking is the least effort approach to enabling that investigation. And seeking to inspire someone with the skills and knowledge -- even if not the time and energy -- to direct the process of enabling it.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re^4: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
by BrowserUk (Patriarch) on Aug 31, 2012 at 16:27 UTC

    Okay. I compiled run.c -> run.e using the cl /E. It produced 30,000 lines of post-precprocessed C. Most of which is unrelated to Perl having been pulled in from a crap load of OS header files. (I won't post it here, it's too big and entirely uninteresting anyway.)

    I then threw that file at clang -- all 30,000 lines of it -- and asked it to convert it to LLVM assembler with no optimisation. It took most of the day cleaning up stuff that clang is really pedantic about -- it deosn't allow duplicate typedefs, even if they are identical except whitespace; it doesn't like MSC source code annotations or accept most of their pragmas; and it doesn't like prototypes without ; on the end ... and there were 1000s of them produced from the perl headers -- but 9 hours work and I got there.

    It produced these 572 lines:

    I then asked it to optimise it. This time it produced just these 363 lines:

    Now looking at that, I can see that it has built data descriptions for a crapload of Windows internal data structures, so I've manually (and conservatively) removed anything that I don't think are used by Perl. (I could have done this at the /e stage, but it was *much* easier reading 700 lines that 30000 lines :)

    What I've ended up with is these 112 lines of IR:

    ; ModuleID = '/tmp/webcompile/_9304_0.bc' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i6 +4:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f +80:128:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct._TP_CALLBACK_ENVIRON = type { i64, %struct._TP_POOL*, %struct. +_TP_CLEANUP_GROUP*, void (i8*, i8*)*, i8*, %struct._ACTIVATION_CONTEX +T*, void (%struct._TP_CALLBACK_INSTANCE*, i8*)*, %union.anon } %struct._TP_POOL = type opaque %struct._TP_CLEANUP_GROUP = type opaque %struct._ACTIVATION_CONTEXT = type opaque %struct._TP_CALLBACK_INSTANCE = type opaque %union.anon = type { i64 } %struct.interpreter = type { %struct.sv**, %struct.op*, %struct.sv**, +%struct.sv**, %struct.sv**, i64*, i8**, i64, i64, %union.any*, i64, i +64, %struct.sv**, i64, i64, i64, i64, i64*, i64*, i64*, %struct.sv*, +%struct.xpv*, i64, %struct._stat64, %struct._stat64, %struct.gv*, %st +ruct.sv*, %struct.tms, %struct.pmop*, %struct.sv*, %struct.gv*, %stru +ct.gv*, %struct.gv*, i8*, %struct.sv*, %struct.sv*, %struct.sv*, %str +uct.hv*, %struct.hv*, %struct.op*, %struct.jmpenv*, %struct.cop*, %st +ruct.av*, %struct.stackinfo*, %struct.av*, %struct.jmpenv*, %struct.j +mpenv, %struct.sv*, %struct.he*, %struct.op*, %struct.op*, %struct.hv +*, %struct.gv*, %struct.gv*, i8*, i64, i64*, i64*, %struct.sv*, %stru +ct.re_save_state, %struct.regnode, i16, i8, i8, [6 x i8*], void (%str +uct.interpreter*, %struct.op*)*, void (%struct.interpreter*, %struct. +op*)*, void (%struct.interpreter*, %struct.op*)*, i64, i64, i8**, i8* +, %struct.regmatch_slab*, %struct.regmatch_state*, i16, i8, i8, i8, i +8, i32, i8, i64, i32, i8**, %struct.gv*, %struct.gv*, %struct.gv*, i8 +*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, i8**, i8*, i8, + i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8*, %struct +.sv*, i64, %struct.sv*, i64, i64, i64, i32, i32*, %struct.gv*, %struc +t.gv*, %struct.gv*, %struct.gv*, %struct.gv*, %struct.av*, %struct.gv +*, %struct.gv*, %struct.gv*, %struct.gv*, %struct.gv*, %struct.sv*, % +struct.sv*, %struct.sv*, %struct.av*, %struct.hv*, %struct.hv*, %stru +ct.sv*, %struct.av*, %struct.av*, %struct.av*, %struct.av*, %struct.a +v*, %struct.hv*, i64, i32, i64, i64, %struct.sv*, %struct.sv*, %struc +t.av*, i8*, %struct.cv*, %struct.op*, %struct.op*, %struct.op*, %stru +ct.op*, %struct.cop*, i32, i32, i8*, i8**, i8*, %struct.av*, %struct. +sv*, %struct.sv*, i64, i8, i8, i16, i32, i64, %struct.exitlistentry*, + %struct.hv*, i64*, %struct.cop, %struct.cv*, %struct.av*, %struct.av +*, i64, i64, %struct.interp_intern, %struct.cv*, i32, i8, i8, i8, i8, + i64, i64, i64, i64, i64, i64, i64, i64, i8**, i8*, void (i32)*, [16 +x i8*], i64, i32, {}*, %struct.sv, %struct.sv, %struct.sv, %struct.sv +*, i64, i64, i64, i64, i64, i64, i64, i64, i64, i8*, i64, i64, i64, i +8, i8, i8, i8, i8*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv +*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, % +struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %stru +ct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.s +v*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, +%struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %struct.sv*, %str +uct.hv*, i8*, i64, [10 x i8], i8, i8, i32, %struct.yy_parser*, %struc +t.sv**, %struct.sv**, %struct.ptr_tbl*, %struct.av*, i8*, %struct.sv* +, %struct.sv**, %struct.av*, %struct.REENTR*, %struct.hv*, %struct.hv +*, %struct._PerlIO*, %struct.PerlIO_list_s*, %struct.PerlIO_list_s*, +%struct.sv*, %struct.perl_debug_pad, %struct.sv*, %struct.sv*, %struc +t.sv*, %struct.sv*, i64 (%struct.interpreter*, %struct.sv*, %struct.s +v*)*, %struct.av*, %struct.av*, i64, i64, i32, %struct.hv*, void (%st +ruct.interpreter*, %struct.sv*)*, void (%struct.interpreter*, %struct +.sv*)*, void (%struct.interpreter*, %struct.sv*)*, {}*, void (%struct +.interpreter*)*, i64, i64, %struct.hv*, i32, i8**, i8 (%struct.interp +reter*, %struct.sv*)*, %struct.hv*, %struct.av*, %struct.hv*, %struct +.hv*, %struct.hv* } %struct.sv = type { i8*, i64, i64, %union.anon.0 } %union.anon.0 = type { i8* } %struct.op = type { %struct.op*, %struct.op*, %struct.op* (%struct.int +erpreter*)*, i64, [2 x i8], i8, i8 } %union.any = type { i8* } %struct.xpv = type { %struct.hv*, %union._xmgu, i64, i64 } %struct.hv = type { %struct.xpvhv*, i64, i64, %union.anon.3 } %struct.xpvhv = type { %struct.hv*, %union._xmgu, i64, i64 } %union._xmgu = type { %struct.magic* } %struct.magic = type { %struct.magic*, %struct.mgvtbl*, i16, i8, i8, i +64, %struct.sv*, i8* } %struct.mgvtbl = type { i32 (%struct.interpreter*, %struct.sv*, %struc +t.magic*)*, i32 (%struct.interpreter*, %struct.sv*, %struct.magic*)*, + i64 (%struct.interpreter*, %struct.sv*, %struct.magic*)*, i32 (%stru +ct.interpreter*, %struct.sv*, %struct.magic*)*, i32 (%struct.interpre +ter*, %struct.sv*, %struct.magic*)*, i32 (%struct.interpreter*, %stru +ct.sv*, %struct.magic*, %struct.sv*, i8*, i64)*, i32 (%struct.interpr +eter*, %struct.magic*, %struct.clone_params*)*, i32 (%struct.interpre +ter*, %struct.sv*, %struct.magic*)* } %struct.clone_params = type { %struct.av*, i64, %struct.interpreter*, +%struct.interpreter*, %struct.av* } %struct.av = type { %struct.xpvav*, i64, i64, %union.anon.2 } %struct.xpvav = type { %struct.hv*, %union._xmgu, i64, i64, %struct.sv +** } %union.anon.2 = type { i8* } %union.anon.3 = type { i8* } %struct._stat64 = type { i32, i16, i16, i16, i16, i16, i32, i64, i64, +i64, i64 } %struct.gv = type { %struct.xpvgv*, i64, i64, %union.anon.7 } %struct.xpvgv = type { %struct.hv*, %union._xmgu, i64, i64, %union._xi +vu, %union._xnvu } %union._xivu = type { i64 } %union._xnvu = type { %struct.anon.5 } %struct.anon.5 = type { i64, i64 } %union.anon.7 = type { i8* } %struct.tms = type { i64, i64, i64, i64 } %struct.pmop = type { %struct.op*, %struct.op*, %struct.op* (%struct.i +nterpreter*)*, i64, [2 x i8], i8, i8, %struct.op*, %struct.op*, i64, +i64, %union.anon.12, %union.anon.13 } %union.anon.12 = type { %struct.op* } %union.anon.13 = type { %struct.op* } %struct.jmpenv = type { %struct.jmpenv*, [16 x i32], i32, i8 } %struct.cop = type { %struct.op*, %struct.op*, %struct.op* (%struct.in +terpreter*)*, i64, [2 x i8], i8, i8, i64, i8*, i8*, i64, i64, i64*, % +struct.refcounted_he* } %struct.refcounted_he = type opaque %struct.stackinfo = type { %struct.av*, %struct.context*, %struct.stac +kinfo*, %struct.stackinfo*, i64, i64, i64, i64 } %struct.context = type { %union.anon.14 } %union.anon.14 = type { %struct.block } %struct.block = type { i8, i8, i16, i64, %struct.cop*, i64, i64, %stru +ct.pmop*, %union.anon.15 } %union.anon.15 = type { %struct.block_sub } %struct.block_sub = type { %struct.op*, %struct.cv*, %struct.av*, %str +uct.av*, i64, %struct.av* } %struct.cv = type { %struct.xpvcv*, i64, i64, %union.anon.11 } %struct.xpvcv = type { %struct.hv*, %union._xmgu, i64, i64, %struct.hv +*, %union.anon.9, %union.anon.10, %struct.gv*, i8*, %struct.av*, %str +uct.cv*, i64, i16, i64 } %union.anon.9 = type { %struct.op* } %union.anon.10 = type { %struct.op* } %union.anon.11 = type { i8* } %struct.he = type { %struct.he*, %struct.hek*, %union.anon.1 } %struct.hek = type { i64, i64, [1 x i8] } %union.anon.1 = type { %struct.sv* } %struct.re_save_state = type { i64, i64, i64, i8, i8*, i8*, i8*, %stru +ct.regexp_paren_pair*, i64*, i64*, i8**, %struct.magic*, %struct.pmop +*, %struct.pmop*, i8*, i64, i64, i64, i64, i64, i64, i8*, i8* } %struct.regexp_paren_pair = type { i64, i64 } %struct.regnode = type { i8, i8, i16 } %struct.regmatch_slab = type { [42 x %struct.regmatch_state], %struct. +regmatch_slab*, %struct.regmatch_slab* } %struct.regmatch_state = type { i32, i8*, %union.anon.22 } %union.anon.22 = type { %struct.anon.26 } %struct.anon.26 = type { %struct.regmatch_state*, i64, i64, i64, i16*, + %struct.regnode*, %struct.regnode*, i8*, i64, i16, i16, i8 } %struct.exitlistentry = type { void (%struct.interpreter*, i8*)*, i8* +} %struct.interp_intern = type { i8*, i8**, i64, %struct.av*, %struct.ch +ild_tab*, i64, %struct.pseudo_child_tab*, i8*, %struct.thread_intern, + %struct.HWND__*, i32, i32, [27 x void (i32)*] } %struct.child_tab = type { i64, [64 x i64], [64 x i8*] } %struct.pseudo_child_tab = type { i64, [64 x i64], [64 x i8*], [64 x % +struct.HWND__*], [64 x i8] } %struct.HWND__ = type { i32 } %struct.thread_intern = type { [512 x i8], %struct.servent, [128 x i8] +, i32, [30 x i8], i32, i16 } %struct.servent = type { i8*, i8**, i16, i8* } %struct.yy_parser = type { %struct.yy_parser*, %union.YYSTYPE, i32, i3 +2, i32, i32, %struct.yy_stack_frame*, %struct.yy_stack_frame*, i64, i +64, i8*, i8*, i8, i8, i8, i8, i64, %struct.op*, %struct.op*, %struct. +sv*, i16, i16, i64, %struct.sv*, i64, i64, i8, i8, i8, i8, i64, %stru +ct._sublex_info, %struct.sv*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i64, + i16, i8, i8, %struct.hv*, %struct._PerlIO**, %struct.av*, [5 x %unio +n.YYSTYPE], [5 x i64], i64, %struct.cop*, [256 x i8], i8, i8 } %union.YYSTYPE = type { i64 } %struct.yy_stack_frame = type { %union.YYSTYPE, i16, i64, %struct.cv* +} %struct._sublex_info = type { i8, i16, %struct.op*, i8*, i8* } %struct._PerlIO = type opaque %struct.ptr_tbl = type { %struct.ptr_tbl_ent**, i64, i64, %struct.ptr_ +tbl_arena*, %struct.ptr_tbl_ent*, %struct.ptr_tbl_ent* } %struct.ptr_tbl_ent = type { %struct.ptr_tbl_ent*, i8*, i8* } %struct.ptr_tbl_arena = type opaque %struct.REENTR = type { i32 } %struct.PerlIO_list_s = type opaque %struct.perl_debug_pad = type { [3 x %struct.sv] } define i64 @HEAP_MAKE_TAG_FLAGS(i64 %TagBase, i64 %Tag) nounwind uwtab +le readnone { %1 = shl i64 %Tag, 18 %2 = add i64 %1, %TagBase ret i64 %2 } define i32 @Perl_runops_standard(%struct.interpreter* %my_perl) nounwi +nd uwtable { %1 = getelementptr inbounds %struct.interpreter* %my_perl, i64 0, i3 +2 1 %2 = load %struct.op** %1, align 8, !tbaa !3 br label %3 ; <label>:3 ; preds = %3, %0 %op.0 = phi %struct.op* [ %2, %0 ], [ %6, %3 ] %4 = getelementptr inbounds %struct.op* %op.0, i64 0, i32 2 %5 = load %struct.op* (%struct.interpreter*)** %4, align 8, !tbaa !3 %6 = tail call %struct.op* %5(%struct.interpreter* %my_perl) nounwin +d store %struct.op* %6, %struct.op** %1, align 8, !tbaa !3 %7 = icmp eq %struct.op* %6, null br i1 %7, label %8, label %3 ; <label>:8 ; preds = %3 %9 = getelementptr inbounds %struct.interpreter* %my_perl, i64 0, i3 +2 78 store i8 0, i8* %9, align 1, !tbaa !0 ret i32 0 } !0 = metadata !{metadata !"omnipotent char", metadata !1} !1 = metadata !{metadata !"Simple C/C++ TBAA", null} !2 = metadata !{metadata !"long", metadata !0} !3 = metadata !{metadata !"any pointer", metadata !0} !4 = metadata !{metadata !"long long", metadata !0}

    Take a close look at the data definitions for Interpreter, SV_any, HEK, COP etc. Isn't that the most concise (and thorough) description of the entire perl internals you've ever seen?

    Doesn't it nake you wonder (just a little), what could it do with all that information?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    [http://thebottomline.cpaaustra

      ake a close look at the data definitions for Interpreter, SV_any, HEK, COP etc. Isn't that the most concise (and thorough) description of the entire perl internals you've ever seen? Doesn't it nake you wonder (just a little), what could it do with all that information?
      No, not in the slightest. I think this is the fundamental impedence mismatch between you and me on this subject.

      You keep asserting that if LLVM can only be given a full picture of the program, it will be able to do (unspecified) wonderful things with it. I come from the viewpoint that for perl to perform a particular action, e.g. my $x = $h{$key}, there are a certain basic number of things the underlying hardware is going to have to at some point, such as calculate a hash index, index into the hash bucket, scan the hash bucket for a matching string, retrieve the associated SV, then copy the relevant part of the SV (e.g. its integer value, or its string buffer) into the SV stored in a pad somewhere that's associated with the name $x.

      Now at the moment, there's a certain amount of overhead associated with doing that via the perl stack and the perl runops loop, but there's still a basic mimimum that needs doing. I earlier showed that the overhead is probably less than 20%. To get better than this, LLVM has got to, in some magical way, cut into basic underlying operations like hash lookup. And I really don't think its going to that.

      Until someone provides me with a single actual concrete example of how LLVM might do anything better than just cut out a bit of that overhead, I'm not going to believe it.

      PS I completely fail to understand the the point of your showing how run.c gets converted to IR. run.c just contains a single trivial C function, operating on a few args and variables of particular types, and the IR knows the types of those args. So what?

      Dave

        No, not in the slightest. I think this is the fundamental impedence mismatch between you and me on this subject.

        Having spent the last 9 hours to produce what I posted, having you dismiss it in under 10 minutes -- barely enough time to read the post, never mind look at the code with any attention to detail -- is ... well ... let's just say, disappointing and leave it at that.

        You keep asserting that if LLVM can only be given a full picture of the program, it will be able to do (unspecified) wonderful things with it.

        Firstly, I said may, not "will". It is "(unspecified)" because -- as I've been at pains to state often & clearly -- noone yet knows. I've provided a long list of possibilities, cited documents with examples of them. But I am not a computer and cannot possibly be expected to mentally run hundreds of thousands of lines of Perl sources through a dozen or more optimisation techniques and pick out salient examples to satisfy your demands for the instant gratification of "a concrete example".

        All I've sought from you, is your knowledge and expertise of and with the existing codebase, to enable the investigations to get a quick, clean start.

        PS I completely fail to understand the the point of your showing how run.c gets converted to IR. run.c just contains a single trivial C function, operating on a few args and variables of particular types, and the IR knows the types of those args. So what?

        The "what" is, that if it can encapsulate and annotate the entire internal (data) structure of Perl, in 100 lines of language independent, platform independent, eminently human readable, metadata, then it can also rewrite those structures and the function trees that use them in ways that neither a C compiler, nor a C programmer -- no matter how experienced -- could ever imagine doing.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re^4: Perl 5 Optimizing Compiler, Part 5: A Vague Outline Emerges
by BrowserUk (Patriarch) on Aug 31, 2012 at 16:54 UTC

    Oh. And finally, I asked it to convert the optimised form and output it as C++. The interesting bits are at the top and the very bottom; there is a load of Windows API stuff that I can't be bothered to delete: It seems at 3000+ lines, it is too big to post here, but man is it ever interesting.

    Not that I'd want to maintain it in that form; but can you imagine converting all the Perl sources into C++, loading it up into a new respository and annotating it and then using that to construct a properly object-oriented Perl5 source tree?

    Proper inheritance of the SVt_types; a full cleanup of all the cruft that has built up over the years; starting anew -- from a clean, compilable and working, if weirdly structured and named -- codebase and going forward from there?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

    .
      No, I think you're talking utter nonsense. Sorry, I've been trying so, so, hard to be open minded throughout these discussions, but I've finally given up. No more time wasting for me.

      Dave.

        Sorry, I've been trying so, so, hard to be open minded throughout these discussions, but I've finally given up. No more time wasting for me.

        Hm. Threatened are we?

        I saw little evidence of an open mind. You nailed your colors to the mast very early on, and now you've seen signs that you might have been wrong, you're taking your keeper's gloves and going home.

        But the ball isn't yours and it stays on the field. If the other players want to continue the game, one of us is going to have to get cold and muddy hands.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://990860]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2024-04-23 12:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found