Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Optimizing into the Weird Zone

by dws (Chancellor)
on Aug 12, 2003 at 10:04 UTC ( [id://283127]=perlmeditation: print w/replies, xml ) Need Help??

I learned some kinky things this morning while chatting with an acquaintance who does research in code generation for a local, well-known chip manufacturer. He and his colleagues have been struggling with the collision between traditional ideas of code optimization and the reality of modern, heavily pipelined processors. They've been struggling with how to characterize optimizations, but have been stymied by seemingly bizarre behavior in modern chips. They've isolated one situation where a single loop can exhibit one of four distinct performance profiles depending on the sequence of instructions that gets executed before the loop is entered. The way the processor pipeline is filled on the way into the loop sets up one of four stable states, each of which performs differently. They've found another case were adding an instruction to a loop improves performance by 20%.

What does this mean? It suggests that once we've optimized far enough, once we've dealt with high-level issues and are down into micro-optimizations, we're increasingly likely to encounter strange and perhaps counter-intuitive differences between sequences of nearly identical code. This might not be as much of an issue with Perl, at least until Perl6, but the possibility for bizarre low-level performance behavior is still there, as is the possibility that code that performs one way in a test harness will perform differently in the wild.

Replies are listed 'Best First'.
Re: Optimizing into the Weird Zone
by BrowserUk (Patriarch) on Aug 12, 2003 at 14:14 UTC

    As noted, any optimisations applied at the level of perl (5 or 6) source are rarely, if ever, going to be affected by the phenomena described.

    As almost every clause of every line of a HLL language like perl results in

    • myriad calls with attendant stack frame building/ collapsing;
    • numerous branch points;
    • several levels of pointer indirection
    which will, in almost every case, break the pipeline and/ or break the cache. Hence, the types of instruction code level optimisation failures that are described are unlikely to ever manifest themselves in any noticeable way at the perl source code level,

    As for "What does it mean". It's an interesting phenomena, and one which means that a lot of clever people are going to spend a lot of time analysing the effects, categorising the edge cases and developing heuristics for use in compilers targeted at pipelined processors, to ensure that optimisations applied by that compiler 'do no harm'.

    Individually, the benefits of these pipeline and cache dependant optimisations are almost immeasurable with todays processors. It is only when testing large-scale, CPU intensive applications compiled with these optimisations enabled that the benefits really come to fruition. If you are running a derivatives regression analyser written using a spread sheet that was compiled using a compiler that correctly optimises to the cache and pipelining used by the processor you are running on, and you get the answer to your "is this a short-term inflection" a few minutes or even seconds before your competitors, you'll grateful for the efforts of those clever people.

    Processor manufacturers are almost duty bound to discover, analyse and develop the heuristics so as to ensure that compiler manufacturers have the wherewithal to show their processors off to the best effect. People buying servers rarely consider buying a slower one because it makes the job of compiler source code maintenance easier. They want throughput.

    I for one am very glad that a lot of very clever people have spent an inordinate amount of time optimising the perl source code. What is the big deal about the perl 5 regex engine? In a word, speed. I benefit from their efforts every time I write a script. As does every other person that uses it. Including all those that eschew the benefits of optimisation. Anyone who has looked at the perl source has to realise that it isn't the way it is, for the benefit of ease of maintenance.

    And therein lies the rub. If your code will only ever be used for one purpose, and that purpose is satisfied by the code you write, you have done a good job. However, if your writing re-usable code, (eg. anything that goes on CPAN) then you have to consider the uses that your code may be put to in the future as well as the use you are writing it for. As it is impossible to predict the operational requirements, including the performance criteria, of every application that it might be put to , then the only way you can do justice to those that will use your code in the future is to do your best to make it optimal for the purpose for which it is designed. "Optimal" does not only cover performance, but it doesn't exclude it either.

    Obviously, optimising your code to the point that you introduce bugs does no one any favours. It's a trade off in which correct code wins every time--but to use more memory than you need to, when a slightly restructuring of your code would use less, is just profligate. We could use a bubble sort for everything. It's very easy to write, understand and maintain. We don't. We use complex, recursive, often highly optimised algorithms, to the detriment of ease of coding and maintenance, for one purpose. Performance.

    Any buyers for a version of perl that is restructured for the sole criteria of ease of maintenance--everything neatly indented, and properly abstracted--if it runs half as fast? What if its 5 times slower?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Optimizing into the Weird Zone
by thor (Priest) on Aug 12, 2003 at 12:51 UTC
    For a long time, I've been of the opinion that after a certain point, optimisation becomes counter-productive. It happens on two fronts. One, you are spending exponential time for linear (if you're lucky) gains in performance. On the other hand, you are making your system overly sensetive to perturbations in the initial assumptions. And all for what? So your system will run in 5 seconds instead of 10. For me, I'll take the less optomized, more stable solution every time. It makes my life easier, but more importantly, makes others' lives easier when they have to maintain anything that I've written.

    thor

      Companies who spend millions of dollars on transaction systems, and those systems are at the core of processes which generate revenue, DO CARE that it takes 5 seconds instead of 10. And that's why chip people will go to great lengths to optimize chips. Because they sell better than chips that aren't.

      As for programming optimizations ... Maybe you spend 40 work hours squeezing a 3% performance gain. Now if your system is being used by a lot of people, and for something "important", you're going to leverage that 3% with every user who saves that time. Given a decent user base, you're going to get that 40 hours back pretty quick.

      People who are trying to make more money by using your software are very concerned that your program does it's task correctly in the shortest time possible.

        You spend countless hours and untold amounts of money on optimisations, only to find out later that the users go out to have a cup of coffee during that long job and their computers happily spend gazillions of fully optimised CPU-cycles waiting for the users to return and hit the 'ENTER' key.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        As I implied before, there's optimizing, and there's optimizing. I'm all for making code run fast. However, there comes a point where you have to consider cost utility. I mean, there's a reason why we don't write everything in assembly code, no? :)

        thor

        I have to agree with husker in one point. Take a bank at the end of the year, 24 hours are short if you have some long running batch jobs, which take some hours. They have to be finished, before the regular jobs will start again. A million times tenth of a second is a difference. And - after all - CPU time costs on a mainframe.

        But as I said another time in Structure is more important than speed, that's just a small part of the development business. Most times it is not worth to spend a single minute for micro-optimization.

        And even for a bank it is more important, to have correct and not fast values in the end. They do not like optimized code, where nobody can fix that special case, that occured for the first time.

        And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
        (Terry Pratchett, Small Gods)

Re: Optimizing into the Weird Zone
by PodMaster (Abbot) on Aug 12, 2003 at 10:53 UTC
    I am reminded of what James said at that one http://sv.pm.org meeting we had at the Dana Street Coffe Shop: "You can't do that guy ... processors aren't that simple these days" (where as the guy could on his old CPU cause he had intimate knowledge of it *pause*aah* assembler).

    PS - for those who don't know the svpm organizer James, he has a mustache and a very deep voice and is strong (for lack of a better word), where as guy has a leather jacket and a full beard (long hair too).

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Optimizing into the Weird Zone
by dragonchild (Archbishop) on Aug 12, 2003 at 12:56 UTC
    Perhaps it suggests that as processors approach quantum levels of computing, that chaos theory starts to come into play. Right now, things are very predictable, except for the tiny points that they're not. In 10 years, who knows?

    I agree with the other respondents - optimize for the programmer, not the compiler. If I can get in, do my thing, and get out in under 8 hours, and be guaranteed there's no action at a distance ... that's optimized. Anything less is poor.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      Perhaps it suggests that as processors approach quantum levels of computing, that chaos theory starts to come into play.

      I don't think you mean what you just said. One outstanding property of quantum level physics is that things are no longer deterministic. In chaos theory, everything is still very deterministic. It's just that very tiny disturbences is the start configuration can lead to very different outcomes. But each input leads to a very determined outcome, and repeatedly so.

      I agree with the other respondents - optimize for the programmer, not the compiler. If I can get in, do my thing, and get out in under 8 hours, and be guaranteed there's no action at a distance ... that's optimized.

      That depends. If you write a program that calculates how much flap an airplane should have during the landing, and your write the program in 8 hours, but the resulting program takes 15 minutes to calculate the flap output than that's not optimized. Programmer time and run time are trade-offs, and the best trade-off isn't always "minimize programmer time".

      Abigail

      I don't think approaching "quantum levels of computing" has anything to do with it. Yes, processors are getting smaller, but I don't think that's the problem the orginal poster is hitting.

      It's not just that CPU manufacturers are approaching sizes of a single atom, but the way the processor works is no longer easily determined. IIRC, Intel gave up publishing op code timings back with the PII 400--the timings fluxuated so much between runs that there was no point.

      Branch prediction, piplining, and now hyperthreading basically make processors into run-time code optimizers. I've studied some hand-optimized ASM compared to the output of GCC, and while the hand-optimized can shave off an instruction or two, it will suffer more if the processor makes a branch misprediction.

      So all in all, its better if programmers leave that stuff alone if we can get away with it.

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      Note: All code is untested, unless otherwise stated

        I've studied some hand- optimized ASM compared to the output of GCC, and while the hand-optimized can shave off an instruction or two, it will suffer more if the processor makes a branch misprediction.

        That just means the guys that optimised the code generators of the GCC compiler did an excellent job.

        So all in all, its better if programmers leave that stuff alone if we can get away with it.

        I bet everyone who uses the GCC compiler is glad that it's programmers didn't take a hands-off attitude.

        I guess it comes down where in the food chain your code comes. If you know it will always be top predator--consuming memory and cycles--then you can afford to apply no bounds, nor waste effort trying to curtail its appitite. However, if your code needs to live in a competative environment sharing limited resources, and especially if your code lives only part way up the food chain (ie. libraries and modules), then your efforts to limit the consumption by your code will have knock on benefits for every run of every top predator written to use it.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Optimizing into the Weird Zone
by hagen (Friar) on Aug 13, 2003 at 00:24 UTC

    Was it Michael Jackson (no, not that one!) who said

    "There are two rules of optimisation...

    a) don't do it

    b) don't do it yet"

    However, take a look at this month's Communications of the ACM - the main topic is "Program Compaction" with five articles on just this topic - compression for size and compression for speed.

    The Guest editors in their introduction say "Component-based object-oriented programming focuses on handling the complexity of large applications... not to produce efficient applications. And that's the way we want it to be... But is this really the case today? For those working in the embedded domain the answer is often a firm `No'".

Re: Optimizing into the Weird Zone
by zentara (Archbishop) on Aug 12, 2003 at 16:27 UTC
    Speaking of the weird zone, some of you may be interested in High-Level-Assembly. It is quite easy to understand and can output nasm code, which can be used with Inline::Assembly. There is a nice book there too, called Art of Assembly, for free. HLA

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://283127]
Approved by broquaint
Front-paged by diotalevi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-24 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found