Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^10: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

by dave_the_m (Parson)
on Aug 29, 2012 at 17:27 UTC ( #990512=note: print w/ replies, xml ) Need Help??


in reply to Re^9: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
in thread Perl 5 Optimizing Compiler, Part 4: LLVM Backend?

(I'll read the pdf later and come back and comment on it further, but for now):

Imagine if -- manually for now -- each SV was born marked (or better unmarked) as carrying magic or not. Say a single bit in the (unused) least-significant 3 or 4 bits of the SVs address, was used to flag when an SV had magic attached to it.

But magic is a dynamic, runtime thing, not something that can be determined at the time an SV is created. For example:

use Devel::Peek; my $x = "abc"; Dump $x; # here $x is not magic $x =~ /./g; Dump $x; # now it is

So I don't see how that could work.

Also, in most binary perl ops, there is a macro call at the top of each function, tryAMAGICbin_MG(), that ORs the flags fields of the top two SVs on the stack together, then checks to see if the combined flags word indicates that either SV is magic or is overloaded, and if so, calls a function which handles this. The rest of the pp_ function can continue while assuming that its args aren't special:

#define tryAMAGICbin_MG(method, flags) STMT_START { \ if ( ((SvFLAGS(TOPm1s)|SvFLAGS(TOPs)) & (SVf_ROK|SVs_GMG)) \ && Perl_try_amagic_bin(aTHX_ method, flags)) \ return NORMAL; \ } STMT_END

Consider the following trivial code:

my $x = 1; my $y = 2; my $z; for (1..1000) { $z = $x + $y; }

cachgrind shows that the tryAMAGICbin_MG() line at the top of pp_add() does 7 instruction reads, 3 data reads and 1 branch per call (with no significant cache or branch predictor fails), while the pp_add() function as a whole does 85 instruction reads. So even if there was some way to know in advance to call a non-magic / non-overload version of pp_add (and I don't think there is) the best improvement you could theoretically achieve is 9%.

So you've shown me an example that (a) can't work, and (b) even if it did, doesn't show me that much could be gained.

Don't get me wrong, I want to be inspired; but I just haven't seen anything yet that indicates we could get more than that 10%.

Dave


Comment on Re^10: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
Select or Download Code
Re^11: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by BrowserUk (Pope) on Aug 29, 2012 at 18:12 UTC
    But magic is a dynamic, runtime thing, not something that can be determined at the time an SV is created.

    I realise that -- but don't get hung up on my speculations. They are not proposals to be refuted. I'm making them up as I go, based upon the types of things LLVM can do, and my extremely limited knowledge of the internals.

    That's exactly why it is important to get someone like you, with your depth of knowledge, to look at LLVM in detail. Once you understand the possibilities, your speculations would be so much more targeted than mine.

    cachgrind shows that the tryAMAGICbin_MG() line at the top of pp_add() does 7 instruction reads, 3 data reads and 1 branch per call (with no significant cache or branch predictor fails), while the pp_add() function as a whole does 85 instruction reads. So even if there was some way to know in advance to call a non-magic / non-overload version of pp_add (and I don't think there is) the best improvement you could theoretically achieve is 9%.

    So you've shown me an example that (a) can't work, and (b) even if it did, doesn't show me that much could be gained.

    Yes. But you've only considered one level deep in a simple opcode. In something like pp_substr there is so much conditional branching and test for magic and utf (and other stuff), on multiple parameters, and then calls to upgrades and ... you know what's in there far better than I

    Its not just the single opcodes, but all the calls made from within them that make these types of checks. It is the application of, the wide view that Whole-Program Analysis and LTO and forgetting C language rules, can bring to the table that could be substantial, not beating the human programmer on individual routines.

    Programmer reputedly can remember 5 or 7 or 8 things at any given moment; programs can build entire graphs and find patterns spanning those graphs. They don't forget things.

    want to be inspired; but I just haven't seen anything yet that indicates we could get more than that 10%.

    I think that's because you haven't yet seen the full potential of LLVM, and I'm failing to describe it well enough and my attempts to speculate are clumsy. Please do read the pdf.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

    pmsig
Re^11: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?
by Will_the_Chill (Pilgrim) on Aug 29, 2012 at 21:21 UTC
    I'm excited to see what Dave thinks after he reads the LLVM docs! :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990512]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (15)
As of 2014-07-25 19:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (174 votes), past polls