Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

CPU cycles DO NOT MATTER!

by dragonchild (Archbishop)
on Apr 17, 2008 at 12:47 UTC ( #681136=perlmeditation: print w/ replies, xml ) Need Help??

Every so often, Perlmonks gets a SoPW asking "Which is faster, A or B?" Invariably, a whole bunch of benchmarks are done and, 99% of the time, the difference, if any, averages to about 3 microseconds. That's 3 millionths of a second. In other words, if you did the slower operation 330,000 times, it might take an extra second. might.

I cringe every time I read one of these nodes. In 99% of the cases, CPU cycles simply don't matter, and here's why:

  • You're using Perl because it's at that sweet spot between simple and expressive. Which means that you spend less time coding and more time . . . well, you spend less time coding.
  • More importantly, changes to your Perl code will, in general, take less time than with any other language and your changes will be safer.
  • CPUs double in speed every 18 months while the cost drops by half (Moore's Law).
  • The ability to write code, unfortunately, is subject to an inverse Moore's Law.
In other words, an hour of programmer time costs an employer between $80 and $100. This includes salary, benefits, legal costs, space, power, machines, etc. A new server costs, roughly, $8,000-$10,000 after you factor in power, sysadmin time, and cost. So, a new machine costs, roughly, 100 hours of programmer time.

Let that sink in. A new machine costs about two weeks of your time. Or, put another way, you are worth about 50 machines per year.

Anything you do needs to be measured in that light by the person paying you. Is it worth having my person spend 3 days here and 2 days there squeezing out performance? Or, is it worth just buying another machine and having my person actually spend time advancing my business? Because, there's an additional opportunity cost to squeezing performance cause you're not doing something else that actually makes money.

So, to recap - CPU cycles don't matter because the limiter isn't the CPU anymore - it's the programmer. Especially in a language like Perl.


My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Comment on CPU cycles DO NOT MATTER!
Re: CPU cycles DO NOT MATTER! (EXPLETIVE!)
by BrowserUk (Pope) on Apr 17, 2008 at 13:06 UTC
      If they wrote their stuff in Perl, then I would say yes. But, the Google architecture is written in order to maximize CPU efficiency. In 20 years, even that will be moot. 10 years ago, what I'm saying would've been false. Right now, in Perl, CPU cycles almost always do not matter.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        Right now, in Perl, CPU cycles almost always do not matter.

        In 1973, oil prices jumped from ~$3/barrel to $12/barrel. The worlds economy went into a downward spiral that would lasted ~7 years. Inflation was rampant throughout the world. Unemployment went through the roof in most Western Economies. Food prices rose at unprecedented rates.

        In 1980, oil prices jumped from $15/barrel to $40/barrel. The world's economies went into downward sprials.

        But still the world car industry, especially in the US, kept churning out 5 litre "shopping cars" return ~10 mpg. Imagine the savings in raw oil stocks if the lessons had been heeded and no new cars were produced from 1974 onwards, that used say, < 40mpg? How many billions of barrels of oil would that have saved?

        Today, oil prices have recently peaked at $115/barrel. The price of rice, wheat and other basic foodstuffs has tripled in some places. Bread, milk, rice and all the foodstuffs produced from them, including meat, are rising weekly--some daily. The worlds banks are suffering meltdown due to bad deptsdebts. Inflation is threatening. Unemployment rising.

        Some figures put the energy used by commercial computer and related equipment as 13% of the total energy consumption world wide. Wasted cycles == wasted energy. Both in the direct energy usage, and the indirect cost of dealing with the heat generated. Throwing hardware at performance problems without considering the possibility of optimisations (either better algorithms or better code), is exactly the same as building low-efficiency gas-guzzling car engines because they are, in the short term, "cheaper" to produce.

        The lessons of history are free to learn for those that will.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Not everything is a map-reduce job.

        Of course it isn't. But how many 1000s of extra machines would they have required to do the what? 80%, 90%, 95%? of their total daily cpu cycle expenditure if they had used clustered RDBMSs for their core operational work?

        You obviously expend the greatest efforts optimising where it will do most good. Not everything you write.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      The reality is that most people aren't Google. Most people want to write an application that handles 10 users.
Re: CPU cycles DO NOT MATTER!
by moritz (Cardinal) on Apr 17, 2008 at 13:09 UTC
    One minor nit:
    CPUs double in speed every 18 months while the cost drops by half (Moore's Law).

    Moore's Law actually states that the number of transistors on a chip doubles every 18 months, not the execution speed.

    I agree that most micro optimizations are moot, but when it comes to choose an efficient algorithm you can really shoot yourself in the foot by choosing a slow one. You might not notice that when you test it with only 1k test data, but you will when that big database rolls over your script ;-)

      moritz,
      I also don't remember it saying the cost is cut in half.

      Cheers - L~R

        I don't think the original does. That follows, though.

        If you're fitting twice as many transistors into the same space, then the same chip design using the new process will cost about half as much because you can produce twice as many per wafer.

        Yet usually that's not how it works, because there are new processor designs that use more transistors by the time the process change comes about.

      Sure. Algorithms can matter. But, they only do so now when dealing with large datasets. As systems get larger, the size of datasets where algorithm choice doesn't matter also gets larger proportionately. So, heapsort vs. bubble-sort matters a lot when working with randomized arrays of 100_000 elements. It doesn't matter at all when dealing with arrays of 1_000 elements. In 3 years, 100_000 elements will be moot.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        Sure. Algorithms can matter. But, they only do so now when dealing with large datasets

        But "large" depends strongly on the algorithm.

        I remember reading a node here (can't find it, sorry) about a regex taking exponential time matching some piece of HTML.

        For an algorithm that needs exponential time a line of 80 chars can be long. Really. In 10 years that limit might be 100. Or 120.

        Typical examples are the knapsack problem that appear in real world applications over and over again, which takes exponential time when solved with brute force. Approximation algorithms can solve it much faster, even if a bit inaccurate.

        I'm sure even you would argue that you should optimize when a O(2**n) code hits you.

        But if you normally don't optimize, you have no feeling for what is slow and what isn't, don't know about profiling etc. Which is why I do optimize my applications from time to time.

        BTW a real world project that has been hit by missing optimizations recently is the KindaPerl6 compiler, which was so slow during bootstrap that it just wasn't practicable anymore. It took the fun out of the development process, and now I haven't seen a single kp6 commit since... lemme check... 2008-03-16. (Surely this wasn't the only problem, but IMHO it was the one with largest impact).

      In the late 90's Moore revised his prophecy to saying that he expected the number of transistors to double every 2 years.

      In a 2000 interview with U.S. News & World Report, Moore said that he expected this rate to hold for another 10 to 15 years.

Re: CPU cycles DO NOT MATTER! (usually)
by kyle (Abbot) on Apr 17, 2008 at 13:25 UTC

    Once in a while, CPU cycles do matter. Once in a while you have a daily report that takes 25 hours to generate. Once in a while, you have a speed problem that's only going to get worse, so throwing hardware at it won't work long. Once in a while, programmer time is free (hobby projects, volunteer time), but hardware is expensive.

    Most of the time, I agree with you. When a "speed my code" question comes up, I think it's important to question whether it's the right question. Once in a while, it is.

Re: CPU cycles DO NOT MATTER!
by amarquis (Curate) on Apr 17, 2008 at 13:37 UTC

    My thoughts on this mostly echo Kyle's, but there is another point to bring up: significance.

    Benchmarks with tiny differences often aren't even significant. That is to say, that 3 millionths of a second you gain in your example might not just be tiny, but not actually reliably exist. I'm searching now for, but cannot find, an old node here that showed that trivial and seemingly unrelated changes to the source bumped the results around by a few percent.

    As far as the original point goes, CPU cycles almost never matter. But there do exist cases where swapping one algorithm for another can offer you a big speed difference in a place where it actually matters.

    I think my own optimization decision making gets summed up pretty well by the following:

    1. Is it slow?
    2. Should I care that it is slow?
    3. Is there an obvious fix? (I.e. am I accidentally iterating over a whole data set when I could drop out on the first success or failure, etc.)
    4. Will new, faster hardware be in place by the time I finish this thing, anyway?
    5. Can I buy the problem away by throwing hardware at it?
    6. Okay, guess it is time to optimize.

      I'm searching now for, but cannot find, an old node here that showed that trivial and seemingly unrelated changes to the source bumped the results around by a few percent.

      You may be looking for No More Meaningless Benchmarks! (an excellent node for all users of Benchmark).

        That is indeed the node, thank you very much.

      0. Does it work?

      My take on "premature optimization is the root of all evil" is simple: first, make sure everything works correctly (e.g. passes all tests).

      Then, and only then, should you look at the remainder of that list.

      Also, between 3 and 4, a good question is "will any fix make it fast enough to be worth my time to implement?"

      <radiant.matrix>
      Ramblings and references
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: CPU cycles DO NOT MATTER!
by hardburn (Abbot) on Apr 17, 2008 at 14:10 UTC

    There are some cases where you can't just throw hardware at the problem. For instance, a hardware XML parser could speed up many operations on a web server. However, if you have a cluster of 20 servers, and each one would need its own hardware accelerator, the cost could be prohibitive.


    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: CPU cycles DO NOT MATTER!
by Limbic~Region (Chancellor) on Apr 17, 2008 at 14:19 UTC
    dragonchild,
    I agree with your sentiment but not the conclusions some might draw from it. It could be argued that what you are saying is tantamount to

    It does not matter how inefficient the code you write is. It is not necessary to understand the complexities of algorithms, data structures, profiling, or benchmarking. It is only important to write code as fast as possible that does the job because CPU time is less expensive.

    In my opinion, it is much better to teach someone to think for themself in determining when and where to spend their time (optimizing or not) than to give them more dogma to follow blindly and mantras to chant to bring more into the fold.

    Cheers - L~R

    Update: s/sloppy/inefficient/ as sloppy has meanings other than I intended. I asked in the CB if the tone of this message seemed inflamatory, which was not my intention. A couple said no and one said yes. I apologize if this is offensive.
      It does not matter how inefficient the code you write is. It is not necessary to understand the complexities of algorithms, data structures, profiling, or benchmarking. It is only important to write code as fast as possible that does the job because CPU time is less expensive.

      Yes, that is exactly what I'm saying as a rule of thumb.

      My mother-in-law writes small programs in VBA for Word and Excel as well as for Crystal Reports. This is as part of her job as a purchasing agent. Which is more important for her - to write code that runs efficiently enough or for her to understand all the algorithms that we, as professional developers, need to know? I would argue that she just needs to do her job. If the code she writes would strike one of us blind, but it runs fast enough on the machines she is given, then what's wrong with that?

      The idea that code needs to be perfect is, itself, dogma. We, as the professionals, need to remember that the majority of all code is now not written by a professional programmer nor is it being written in an enterprise-grade language. It's being written in VBA, Lotus's macro language, and similar tools by business analysts, bankers, and stay-at-home moms. For them, algorithms would just get in the way.

      Why is this important for us? Well, I know that roughly half of all my work tends to be stuff that can take 10x as long to run and still be fine. For that stuff, runtime efficiency is inefficient.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        dragonchild,
        Hrm - let me try again then.

        It never matters how inefficient the code anyone writes is. It is never necessary to understand the complexities of algorithms, data structures, profiling, or benchmarking. All anyone needs to do is write code as fast as possible that functions correctly. It is never important to ensure your code scales because CPUs are getting faster than everyone's data is getting larger.

        I have already agreed with your sentiment and am not arguing that you are wrong. I am pointing out that what you have said may not come across as a rule of thumb to everyone. In fact, the way you have bolded it, upper cased it, and followed it with an exclamation point - you have made it seem like there is no room for discussion. If you take your argument to one possible logical conclusion, it becomes ridiculous.

        "You're using Perl because it's at that sweet spot between simple and expressive

        That is a bad starting assumption. Why don't you come hang out in #perl on freenode some night. There is no end to the number of people using perl (and PHP) because they do not seem to be competent enough to learn any other language. The task at hand may already be borderline at best for being done in perl. Throw in inexperience and lack of knowledge and it will never accomplish the goal - no matter how long you let the CPU spin.

        Cheers - L~R

        Unfortunately, this “rule of thumb” of yours only applies when the task in question is on-the-order-of what your mother-in-law (i.e. a human being) deals with.

        The fundamental fallacy of your argument rests in the fact that nearly all of the “hard” problems that a business-oriented computer must deal with are I/O-bound, not CPU-bound. Therefore, the justification that “CPU time is less expensive” becomes entirely specious since “the CPU” is barely even used.

        The ruling constraint (to the hardware) is ... always is ... I/O and nothing else. Any strategy must necessarily focus upon the reasonably-efficient utilization of that resource, at least insofar as the entire computer-system is concerned.

        The goal therefore must be to devote “only a reasonable amount of frighteningly-expensive human time” toward a strategy that efficiently avoids the hardware-constraint of I/O, even at the expense of (generally quite throw-away...) CPU resources.

      Right on... using an O2 algorithm when a well-known log O algorithm is available on serious amounts of data can easily make CPU cycles matter. However, that could be construed as just knowing what you're doing.

      As Donald Knuth said (paraphrasing Hoare),

      "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

      So, dragonchild is mostly correct.

      sas
Re: CPU cycles DO NOT MATTER!
by citromatik (Curate) on Apr 17, 2008 at 15:25 UTC

    Obviously you work as a professional programmer, and obviously, you are talking under the "productivity" point of view of a company. When I code at home, I prefer to spend some time thinking about how to make my code more efficient than going to buy another computer if I need to run it faster :) .

    I think that knowing how to make your code more efficient is an important step when learning how to program. If you learn how to make your code faster, at the same time you will learn if the goal worths the effort, and surely you will be a better programmer.

    citromatik

Re: CPU cycles DO NOT MATTER!
by samtregar (Abbot) on Apr 17, 2008 at 16:09 UTC
    Preach it, man, preach it! I'd go further - performance matters much less than 1% of the time in my experience. Maybe 0.1% would be more accurate. Even taking an edge case like Google, the performance sensitive stuff they do can't be more than a few percent of their total effort. Their best idea - map-reduce on a gigantic cluster - seems to me to be mostly a great way to speedup slow code with no programmer effort!

    It's a damn shame though - optimization is such fun. HTML::Template::JIT was a blast to write, but it's worth noting that despite being the absolute hands-down fastest templating engine available it's virtually never been used. I couldn't even justify using it in a mass-mailer with very tight performance requirements, templating just wasn't the bottleneck (MIME::Lite was).

    -sam

Re: CPU cycles DO NOT MATTER!
by mr_mischief (Prior) on Apr 17, 2008 at 16:10 UTC
    What if the code is running on far more than 50 machines? If I save 10% on 500 machines or 1% on 5,000 machines, is that worthwhile?

    I think Kernighan and Pike came up with the best metric for when to optimize in The Practice of Programming. They say that the time spent optimizing should never outweigh the run time saved through optimization unless you're against a hard deadline the software can't meet.

    You might want to weight this for the price of computer time vs. programmer time, so if your programmer's time is worth $50k a year and the computer time spent running the software is $5k a year, don't let them spend more than a man month optimizing. Since the computer will probably get upgraded at some point, make it 2 man weeks. This way, your $2k investment in your programmers pays off the first year and to some extent every year after.

    For a program that runs on one machine once in a while, throw hardware at it. For something that's the sole application of a farm of hundreds of servers, you might save tens of of thousands of dollars every year by spending a man week doing the simple, obvious optimizations.

Re: CPU cycles DO NOT MATTER!
by swampyankee (Parson) on Apr 17, 2008 at 16:44 UTC

    There are still the odd applications for which a few CPU cycles per computation can add up to a rather significant difference, but these tend to be written in C, C++, or Fortran, not Perl. Interestingly, these are largely written by people whose college diplomas have words like "physics" or "aerospace engineering" on them, and are in application domains such as numerical relativity, n-body problems where n>>106, computational fluid mechanics, 4d climate models, etc, where the problems have to be radically simplified to run on teraflop machines.

    But, yes, for the vast bulk of applications, it is foolish to shave a few CPU cycles.


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

      There are still the odd applications for which a few CPU cycles per computation can add up to a rather significant difference, but these tend to be written in C, C++, or Fortran, not Perl
      And there is a reason for that: if you need that kind of tiny optimization, you've already switched that part of the code to C or C++ or some other high-performance language weeks or months ago - those languages can easily give you a 1000% performance boost over perl with no changes in the algorithm whatsoever. Just compare the cost of a perl method call to one in C++.

      But since perl integrates pretty nicely with C and C++ and is so much easier to code in, in general it's still much better to start out with pure perl, and only re-write the stuff that really does need to be fast in C/C++.

      Update: that's to say that I generally agree with the OP, but I'm also working on a project that's already spending about as much on hardware as on programmers. Spending another man/month or two to decimate (literally) the hardware cost can certainly be worth it.

        and only re-write the stuff that really does need to be fast in C/C++.

        So, you are optimising your project, but "generally agree with the OP," when he says :" CPU cycles DO NOT MATTER!"...?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: CPU cycles DO NOT MATTER!
by starbolin (Hermit) on Apr 17, 2008 at 18:35 UTC

    I don't know about what percentage of programmers use perl or what percentage work on web apps. The only experience I have to base my decisions on are my own and my own experiences says that cycles and architectures 'do' matter.

    I've spent my life counting cycles. In my field cycle counts often determine whether an an idea is producible or whether one is a no-starter. Yes processors are always getting faster but many applications have other constraints on processor selection and products have to ship with the processor available 'today' and the market won't wait for 'next-year. Once your processor selection gets outside of the commodity processors Moore's law often operates in fits and spurts. I often design with processors that won't be available for 9-16 months. Waiting for generation II would put production out beyond realistic business planning cycles.

    Yes, historically such cycle limited code has been written in C or assembly but the same forces that have made Web apps independent of language efficiency have made dynamic languages like Perl available to the hardware designers and embedded designers. These designers 'do' care about cycle counts.


    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
Re: CPU cycles DO NOT MATTER!
by lima1 (Curate) on Apr 17, 2008 at 20:12 UTC
    A "Which is faster?" question might be stupid in 99% of the cases seen from an economic point of view, but the threads here are typically the more interesting ones. I remember the discussion about using match variables.

    And of course, in many cases, people ask here because CPU cycles matter, simply because their script is too slow.

    Btw: it is simple (at least in theory) to calculate whether a particular optimization is economically a good thing. just run the profiler and apply http://en.wikipedia.org/wiki/Amdahl's_law.

Re: CPU cycles DO NOT MATTER!
by ruzam (Curate) on Apr 17, 2008 at 21:31 UTC
    I grew up on mainframes. Not by choice, that's just where the jobs were when I graduated. Your words have never been truer on mainframes.

    Seems every 'optimization' I could ever remember coding in COBOL database access was eventually nullified by either hardware improvements or underlying system optimizations that worked around inefficient programs (The vender was deep into optimization, not us). We were always creating ever more complex, in-efficient and slower running code, and the hardware was always upgraded to keep step. You could even say "throw more hardware at it" was the mantra. The same attitude was taken towards refactoring. Not a lick of old code was ever cleaned up (cost < benefit). New code was built on old code, which was built on older code, etc, etc.

    Today this company continues to fight an escalating war between ever slower applications and ever faster hardware. So it has been in the 20 years since I started there and who knows how many years before that. It's a festering, stinking pile of code dung that nobody wants to touch. It's too expensive to maintain and even more expensive to replace.

    This company is probably as profitable today as it was 20 years ago (much to my surprise), so it's not like the error of their ways has killed them. But I believe if they'd just taken the time 20 years ago to foster a more 'efficient' attitude they'd be a much more profitable company today. Every thing looks cheaper when you're only looking at the short term.
Re: CPU cycles DO NOT MATTER!
by Porculus (Hermit) on Apr 17, 2008 at 22:26 UTC

    Unfortunately, the people who pay us do not always view things rationally; in particular, they tend to be quite happy to rubber-stamp existing ongoing expenses, and very reluctant to authorise new expenditure, even when the new expenditure could reduce the ongoing expenses.

    Or, to put it bluntly: in my experience, most PHBs would rather you wasted two weeks optimising your code, than spend $8,000 on a server upgrade.

    Why? I don't know for sure. The optimist in me says it's because my salary has been budgeted for, and there's no room left in the budget for hardware. The cynic in me says that it's because they don't feel they have any control over schedules (everyone knows IT projects always run late!), but they do have control over the purse-strings, so they control what they can. Or maybe it's because they think that we're being lazy, asking for faster hardware instead of doing our jobs properly and writing fast code, and they don't want to reward laziness?

    Really, heaven only knows. But it's a fact of life for some of us... and that means we do have to worry about CPU cycles, because even if they're irrelevant in dragonchild's Utopia of limitless upgrades, the real world is not so generous.

We Benchmark for Fun
by ruoso (Curate) on Apr 18, 2008 at 09:30 UTC

    The point you missed is that most of the time, this benchmarking exercises are just plain fun. People like doing it because you end up learning several things about how Perl and perl works.

    Yes, they are useless for their result in the sense of the program in itself, but they are far from useless for what you learn by playing with them.

    daniel
Re: CPU cycles DO NOT MATTER!
by bastard (Hermit) on Apr 18, 2008 at 09:49 UTC
    You've got me totally scratching my head. Nearly every major app i've ever worked on needed optimization to get the most out of the hardware because it was a problem. I'm not talking about super scientific migrate code to assembly optimizations, but a general culture of development that paid attention to efficiency (as well a security, growth, expansion, etc...) All the smaller optimizations add up in the long run.

    In your example speaking of 3 millionths of a second, what happens if there are 100,000 cases in the codebase where there is a 3 millionth of a second delay? Thats a hundreth of a second. Now multiply that by a few hundred thousand operations per second... You've got a real problem at this point. However I'm not talking about smallish apps like the code that drives perlmonks, slashdot, facebook and the like. I'm talking about those multi million line codebases, like online banking apps, stock markets, custom erp apps, etc... Not paying the right attention to efficiency can kill you as the app scales. It may blaze with only a few hundred thousand or million records but what happens when you've got several billion and you're trying to guarantee a 2 second response time? (The latest scaling/efficiency issue i read about was with some forum application where the customers were had forums with hundreds of thousands of posts and searches were timing out. The answer from the developers? Archive old posts...)

    The one time I had the oportunity to witness what happens when efficiency wasn't given its due (outsourced development), a pair of Sun E10000s were required to support a customer base of maybe 50,000. Meanwhile the app we were working on supported millions of concurrent sessions on a single E10000. Once you scale beyond the small and medium business, optimization can cost millions. In both cases the hardware cost more than 20 man-years of development.

    No, for most people efficiency isn't all that important for smaller apps and its smarter to write a flexible and robust application first and optimize later vs. writing an optimized application first and hacking new functionality later. But to say attention to efficency isn't a good idea is questionable in my mind.

    dratsab
      If you're a bank, then you should have the money to scale your hardware to meet your demand. If you have 100,000 cases of something in a codebase, your code is too large. I have worked on dozens of codebases, large and small, in companies large and small. I've had never seen an instance where an application couldn't be written in under 100kLOC, most under 25kLOC. Especially in Perl. I will grant that there are situations where that's needed, such as an OS. But, I will go out on a limb and say that 90% of all codebases are under 100kLOC and shouldn't care about the CPU.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        In the example I mentioned I was around to witness it _was_ a bank and it was the online banking application for business accounts. They spent somewhere around 2 million on the software and planned on spending about $250,000 on the hardware. Turns out they had to spend nearly 6 million on hardware just to get the thing to support the current customer base. Meanwhile we (the internal developers) had the system supporting the personal banking app working on half the hardware, supporting 20x the sessions... Yes, the bank had the money to purchase the additional hardware. Yes they sued the development company for delivering faulty software. Yes, probably far less than 1 million spent on efficiency tuning would have let them save over 5 million on hardware. Just because a company has the money to throw at a problem doesn't mean they should spend it. In this case they had to go that route because of a hard launch date tied to national advertising campaigns. (God I hated those, they'd sometimes tell us of a campaign to start in a few days and then request about 3 weeks worth of work to support the campaign.)

        The systems i've dealt with have hundreds or thousands of tables in the backend with tens of thousands of columns and millions to billions of rows. One of them automated the companies knowledge and processes to the point where we could hire $12/hour operations staff instead of $50/hour professionals as the knowledge was manged by the system not the people.

        Another company i've seen with efficiency problems is one of the larger ERP application providers recently purchased by a fortune 500 database company. It is massively complex software and I recall hearing the company's tech snickering, leaving a requirements session saying things like "that screen they want to draw will take 20 minutes to generate" (a screen to be used regularly by people in bank branches to service customers).

        Does the under 25kLOC count the code from CPAN modules used? The one huge efficiency bugaboo I encountered in a CPAN module " DBIx::Class" (actually it was in redhat's version of perl5.8.8 was the culprit, but it was DBIx::Class that was exercising the slow section of perl). With the problem which was a bugfix of only a handful of lines perl/DBIx::Class got an order of magnitude slower for certain operations. Operations that would definitely affect site performance. If memory serves that bug also affected things other than DBIx::Class but i can't recall what. https://bugzilla.redhat.com/show_bug.cgi?id=196836

        I should mention again that I don't value efficiency over flexbility and clean code. But time and time again i've seen the value of having an efficiency mindset. Personally I prefer to start flexible and optimize where necessary.

        Consider omething simple like this:

        open(FILE, "<billionlinelogfile.log"); my @file = <FILE>; close(FILE); foreach my $line (@file) { do stuff... }
        vs.
        open(FILE, "<billionlinelogfile.log"); while(<FILE>) { do stuff... } close(FILE);
        Now admittedly for most software professionals, this isn't even conscious thought and was a very basic example.

        dratsab
        If you've "never seen an instance where an application couldn't be written in under 100kLOC" then I envy you. I wish my life could be that simple.
Re: CPU cycles DO NOT MATTER!
by tinita (Parson) on Apr 18, 2008 at 11:43 UTC
    programming is about algorithms.
    fortunately, perl does a lot of work for you by offering data structures like hashes for free. but you should still have a clue what takes time and what not. you should know about algorithms in general and also a bit about the performance of the language you program in.
    cpu cycles can matter faster than you think. if you program an application framework that uses modules, you can program for maximum maintainability. if you program a module which gets executed by a framework very often you might want to benchmark a bit. think about DBI. imagine it was written in pureperl and without any care for performance - oh my god, database interaction would be soo slow in perl. if no module author would care about speed, perl itself would be slow because cpan is part of the language somehow. i agree that a very small difference doesn't usually matter because it might be just a platform/version issue that changes in the next version, but to know how to benchmark and to get a feeling about efficiency does not hurt.
Re: CPU cycles DO NOT MATTER!
by sundialsvc4 (Monsignor) on Apr 18, 2008 at 13:51 UTC

    I believe that the core points of this post are very valid, even though the title is somewhat misleading. “CPU cycles,” of course, do matter ... but only in a handful of situations. I don't fault this posting for its title; nor should we. Look beyond that.

    “The 80/20 rule” is a familiar maxim that applies over-and-over to our industry in all sorts of useful analogies. Furthermore, most of the things that we actually want a computer to do, do not have CPU clock-speed as their primary, ruling constraint. The speed of any system is ruled by its slowest component. Which is inevitably I/O.

    “Input/Output” can appear in unexpected places, though, and it can wear many disguises. For instance, I've spent a lot of time recently working with a group that loved to use hashes: big hashes. Very Big Hashes. Sometimes those hashes were “tied” to a Berkeley DB; sometimes they were not. But they'd sit there wondering why the processes were “so damned slow,” and they'd throw cluster-bombs of computers at those processes and they were still slow. I explained that, while they were well on their way to making a hardware-salesman very happy while they were busting their budget, those algorithms were doomed to be slow forever.

    I patiently taught them about virtual-memory, and thrashing, and working-sets and page-faults. Then I taught them about the strategies that were used a half-century ago to do much bigger tasks than these with punched cards (and in much less time). Slowly the mind-shift began to sink in, but I do mean, slowly. But, one by one by one, the programs sure got faster!

    One of the very best books on programming I have ever read is The Elements of Programming Style. (I still have the original FORTRAN edition.) Condensed into that tiny volume, in the best tradition of Messrs. Strunk & White, is a cornucopia of practical information elegantly presented. Such as:   Don't “diddle” code to make it faster... find a better algorithm.

    A fundamental book on algorithms, such as How To Solve It should also be on your library-shelf and its well-thumbed pages should be chock-full of Post-It™ notes. (Your college “Data Structures” textbook, while arguably very-good, is not the same thing.)

Re: CPU cycles DO NOT MATTER!
by Anonymous Monk on Apr 18, 2008 at 18:56 UTC
    So, to recap - CPU cycles don't matter

    Occasionally, while surfing the Perlmonks site I wonder whether the software it runs upon has been written with that mindset in place... (which often goes hand in hand with "memory usage doesn't matter because RAM is cheap").

      Occasionally, while surfing the Perlmonks site I wonder whether the software it runs upon has been written with that mindset in place...
      thanks, that made my day =)
Re: CPU cycles DO NOT MATTER!
by radiantmatrix (Parson) on Apr 21, 2008 at 16:40 UTC

    If we're confining ourselves to optimizations that make a very small difference per-operation, then I'd generally agree that the vast majority of the time1, the CPU time doesn't matter.

    On the other hand, the core of your argument is somewhat specious:

    Anything you do needs to be measured in that light by the person paying you. Is it worth having my person spend 3 days here and 2 days there squeezing out performance? Or, is it worth just buying another machine and having my person actually spend time advancing my business? Because, there's an additional opportunity cost to squeezing performance cause you're not doing something else that actually makes money.

    When you talk about lost opportunity cost, etc., you miss a major piece of the puzzle -- it might take me 3 days to find the best algorithm to use (and get it working), but that effort has zero ongoing cost. In fact, I can use what I learned the next time there's a related problem, and I won't be spending that same 3 days again.

    However, adding a new server does have ongoing costs. You have to admin it, secure it, keep its software up to date. You also have to plan to replace it when it breaks or ages.

    Choosing when and what to optimize is extremely complicated. I tend to agree with the Two Rules of Optimization (I think these are Knuth's?):

    1. Don't
    2. (For experts only) Don't yet

    So, if I were to rephrase your advice, I might say this: don't worry about optimizing your code until it works, and even then only if there is really a performance problem. I may even add that sometimes adding resources2 is a better solution than optimizing code -- but I disagree that it's always a better solution.

    Beware false absolutes.

    Footnotes:

    1. I don't like throwing around numbers like "99%" unless there's actually some data to suggest that number. At least as an estimate.
    2. CPU isn't the only limiting factor that people optimize for. There's also memory and disk usage, network bandwidth, and a myriad other things that fall under the "resources" umbrella.
    <radiant.matrix>
    Ramblings and references
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet

      “A better algorithm” is always going to be significantly faster; otherwise it's not a better algorithm.

      When I got started in this business, computer-cycles were a thing that had to be rigorously conserved. But it wasn't just the CPU; it was every part of the hardware. Things were smaller and slower. Algorithms had to be smarter just to get the work done. And it was done. (Imagine a timesharing computer with a 1.5 mHz clock and 512K of memory with 32 terminals attached to it, all being used to register a college of 5,000 students for classes ... with less than one-second response time to any request, even if every user hit the Enter key at precisely the same instant, as we actually confirmed.)

      Now, we've got an embarrassment of riches. We can throw hundreds of pounds of silicon at any problem. A great big free sand-pile. But there are still choke-points, and if those choke-points are not clearly taken into account by the algorithm, we'll have poor performance and 0.01% CPU-utilization. (The two often occur together.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://681136]
Approved by Corion
Front-paged by clinton
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-08-01 02:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls