-3 is not the same as 2e6
On my machine 2e6 is way too slow
I used 2e6 for the opposite reason: even -1 was way too slow on my machine.
Yes, you need to pick an appropriate number for your machine if you skip the "try to guess how many iterations to run" step. That doesn't invalidate any of my points. I also used 2e6 because, for reasons of quirks of the Benchmark.pm source code, it is easier to disable the "eliminate overhead" junk if you also skip the "negative argument" support.
With just a bit of work, I could split the code and get a run with -3 that didn't try to eliminate overhead, but the numbers wouldn't be any different. I changed just one line of Benchmark.pm so that it skipped all of that nonsense and 2e6 is plenty for reasonably accurately measuring these times here (but only takes several seconds to run).
If you are trying to argue that you didn't get the warning so your run was not a ridiculous benchmark, then I point you to some code from Benchmark.pm:
# A conservative warning to spot very silly tests.
# Don't assume that your benchmark is ok simply because
# you don't get this warning!
print " (warning: too few iterations for a reliable cou
+nt)\n"
The author declared this "very silly".
There are 2 reasons that even -1 was way too slow for me (beside the fact that I'm impatient).
If you look at some of the numbers from the report, you can see that 2e6 is about how many iterations should have been run for an argument of -1. But Benchmark.pm realized that obeying the -1 I gave it would mean unacceptably inaccurate numbers (when it tried to "eliminate overhead"), as evidenced by the warning. And from the time required before I gave up waiting, I bet that Benchmark.pm realized that -3 was also too small.
If you compare some of the numbers from my post, you can see that Benchmark.pm felt that "time with overhead" was in the ballpark of 6-times as much a "time minus overhead". So, for -3, it would have to (I think) resort to at least 18 CPU seconds per subroutine and I wasn't going to sit around for minutes waiting for fanciful numbers devoid of practical value.
So it may have been thrashing, trying harder and harder to come up a ludicrously huge enough number of iterations for it to be able to skip the warning. If you'd made the subroutines just a little bit more trivial, then you might have even gotten Benchmark.pm with a -3 argument to still give up and declare that it just wasn't reasonable to try to "eliminate overhead" for such ridiculously trivial nano-operations. It tries very hard to never do that, but I've still seen it done.
So those two reasons are really just two sides of the same coin: The code being benchmarked was too trivial.
I realize that it is "no fun" if something called Benchmark.pm seems to only ever tell you that your tricky optimization didn't give you more than a 5% speed-up (if that much). But the significant work (a significant fraction of the total complexity of the code in that module) done to make the numbers often look much more dramatic, is mostly just a great source of fiction.
If you somehow come up with some tiny operation that your code does hundreds of thousands of times while doing almost nothing else including not making about that many subroutine calls or executing the body of a loop that many times, then you have done the nearly impossible of finding a situation where Benchmark.pm's work to "eliminate overhead" may actually lead to more accurate measurements rather than less (in some ways, though still not in others).
On the other hand, if you've found an actually useful speed-up (one that can result in a script actually running perceptively faster), then Benchmark.pm's attempts to "eliminate overhead" just won't have much impact on the numbers reported.
But it is impossible to call 'return' (from a sub) tens of thousands of times without also calling subroutines at least that many times. So, for this test, Benchmark.pm's extra work is completely inappropriate, even before you take the next step of considering possible real, practical implications.
|