comment on

I had been hoping to eventually see a response to explain this impossibility. This thread just now came up in the chatterbox and I'm glad that BrowserUk took some time to help me understand his perspective on the issue.

The title of the node seems to clearly state that micro optimizations can "pay off". The node clearly indicates that about a 6-fold speed-up was obtained via micro optimization (and BrowserUk confirms in the chatterbox that this was the case). Some people who responded in this thread seem to think that it is possible for a 25% speed-up via a micro optimization to cause a 6-fold speed-up. Even more people have indicated that they think this node made exactly that claim (but BrowserUk explains that this was not his intent -- more on this in a bit).

I can just see this thread being a huge boost to the problematic micro optimization craze. So I think it is important to strongly counter the impression this thread has been giving some people.

First, let's address the part that BrowserUk did not intend to convey.

[...], modifying these to do [...] has a substantial effect on the performance of the code. Reducing the draw time of the cube, from around 2 minutes (on my box) to around 20 secs.

To me, this very clearly implies that this one simple micro optimization results in about a 6-fold speed-up. BrowserUk explains that just a few lines up, the sentence:

One of the major causes of the slow performance is his [...] use of [...]

was meant to indicate that several micro optimizations combined to give the 6-fold speed-up.

So we do not have an example of a single, less-than-100% speed-up producing a total speed-up of over 500%. That would be impossible.

If you have a micro-optimization that Benchmark.pm is capable of discerning a 50% speed-up in, you will be unlikely to see even a 25% total speed-up. Benchmark.pm goes to extreme lengths to try to isolate the item being timed. Even if your code does nearly nothing but that same bit over and over, there will be overhead that Benchmark.pm tries to eliminate from the calculations. But this overhead cannot be eliminated from the running of your script and will not be sped up by your optimization and so it will dillute the total speed up.

In practice, a 50% speed-up via a micro optimization is pretty rare and is likely to only have a marginal impact on the total run-time of your script since most scripts don't spend even 50% of their time doing just one tiny thing. The smaller the operation you optimize, the smaller the part it is likely to play in the total run time of your script. It is a very unusual script that has even close to 50% of its total run time tied up in a "micro" operation.

If your script spends 20% of its time doing the operation, then a 40% speed-up in that operation can't give you more than a 8% total speed-up. Measuring these percentages is certainly tricky, but the math is not tricky.

Let's make the math really simple and even give the optimizers a big advantage. We'll interpret a "40% speed-up" as meaning that the new run time is 40% less than the original run time (new run time = 60% of original run time). Benchmark.pm would actually report this as "-40%" (the original being 40% slower) and "+71%" (the new being 71% faster) so the person doing the optimization would surely claim a "71% speed-up".

So the simplified math would be: 1 - ( 0.80 + 0.20*(1-0.40) ) = 1 - 0.92 = 0.8 = 8% but in reality that represents a 71% operation speed-up producing a total speed-up of 7.4% (when the operation is 20% of the total run time).

And having seen lots of micro optimization attempts, I feel confident in saying that micro optimization are most likely to produce less than a 20% speed-up and consist of less than 10% of the total run-time and so are very unlikely to give you even a 2% total speed-up.

"So, if you have 100 such optimizations, you could expect a 200% speed-up"

No.

If you have 100 optimizations, then they can't total more than 100% of the run-time. If you have 100 micro optimizations, then I'd bet they don't total more than 50% of the run-time (because Perl has quite a bit of overhead). Even if all 100 of those micro optimizations made each of their operations 10,000-times faster, the total speed up would be less than a 50% reduction in the total run time.

So I don't even believe BrowserUk's claim that several micro optimizations together resulted in a 6-fold speed-up.

To get a 5-fold speed-up, you need to eliminate 4/5th (80%) of the run time. So, for example, if you amazingly found several micro operations that accounted for 90% of the run time, you'd have to speed each of those operations up 9-fold in order to get a 5-fold total speed-up:

0.20 = 0.10 + 0.90 * x
0.10 / 0.90 = x
1/9 = x
[download]

Well, BrowserUk chose a "major cause" of the slowness and managed to speed it up 25% (not 900%)! It just doesn't add up.

I'm sure BrowserUk believes that micro optimizations made the code 6 times as fast. But I can't believe that conclusion at all based on the evidence present.

I can only guess that a non-micro optimization made a more-than-6-fold improvement of a huge percentage of the total run time, leading to the over-all 6-fold speed-up and BrowserUk did not realize/recognize this.

So I don't believe "micro optimizations can pay off" with a 6-fold speed-up.

- tye

In reply to Re^2: Micro optimisations can pay off, and needn't be a maintenance problem (I don't believe it) by tye
in thread Micro optimisations can pay off, and needn't be a maintenance problem by BrowserUk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Do you know where your variables are?
	PerlMonks