Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Benchmark.pm: Does subroutine testing order bias results?

by jkeenan1 (Deacon)
on Jul 12, 2004 at 02:41 UTC ( #373536=perlquestion: print w/ replies, xml ) Need Help??
jkeenan1 has asked for the wisdom of the Perl Monks concerning the following question:

Does the order in which Benchmark.pm tests various subroutines bias the results which Benchmark reports?

This is the inference I am drawing from repeated tests using Benchmark, and I would like to know if other users have experienced the same phenomenon.

The specific situation: I am preparing an update of my CPAN module List::Compare. I have been tweaking its internals in the hope of getting a speed boost, and would like to know for certain whether the *cumulative* result of these tweaks is a speed up of the operation of the module *as a whole*.

To test this with Benchmark, I did the following:
1. Renamed the new version of the module 'Mist::Compare'.
2. Wrote subroutines which created List::Compare and Mist::Compare objects, respectively, and called two typical (intersection) methods on each. In order to give these tests a good workout, I passed each constructor references to three lists of 30000, 27500 and 7500 items, respectively, with enough overlap to guarantee that there was a nonzero intersection.

sub listc { my $lcm = List::Compare->new( $listrefs[0], $listrefs[1], $listrefs[2]); my @int = $lcm->get_intersection(); my $intref = $lcm->get_intersection_ref(); } sub mistc { my $lcm = Mist::Compare->new( $listrefs[0], $listrefs[1], $listrefs[2]); my @int = $lcm->get_intersection(); my $intref = $lcm->get_intersection_ref(); }

3. Benchmarked these two subroutines with varying numbers of iterations, with the following results. (For simplicity, I'm only going to show the most critical measurement: the 'usr' time.)

Benchmark: timing 10 iterations of listc, mistc... listc: 91.58 usr mistc: 100.13 usr Benchmark: timing 50 iterations of listc, mistc... listc: 506.71 usr mistc: 524.29 usr Benchmark: timing 100 iterations of listc, mistc... listc: 727.00 usr mistc: 750.56 usr Benchmark: timing 100 iterations of listc, mistc... listc: 731.85 usr mistc: 751.10 usr Benchmark: timing 100 iterations of listc, mistc... listc: 731.89 usr mistc: 753.57 usr

Note that in each case the older -- and presumably slower -- module outperformed the newer, revised module. This ran contrary to my expectations, as each modification I tried out in the newer version had itself been benchmarked and only included in the newer version if it clearly proved to be faster.

I started to wonder: What would happen if I simply reversed the order in which Benchmark tested the two modules? To do this, I simply aliased mistc() to a new subroutine with a name lower than 'listc' in ASCII order:

*amistc = \&mistc; Benchmark: timing 10 iterations of amistc, listc... amistc: 90.80 usr listc: 102.63 usr Benchmark: timing 50 iterations of amistc, listc... amistc: 508.31 usr listc: 405.34 usr Benchmark: timing 100 iterations of amistc, listc... amistc: 727.48 usr listc: 748.60 usr Benchmark: timing 100 iterations of amistc, listc... amistc: 737.53 usr listc: 765.64 usr Benchmark: timing 100 iterations of amistc, listc... amistc: 734.79 usr listc: 754.06 usr

Note that, with one exception (the second case above), the first subroutine to be tested ran faster than the second -- even though in this case the first subroutine was *exactly the same* as the second, slower running subroutine in the first case above.

It almost seems as if Benchmark -- or Perl -- is getting tired when the subroutine it is testing involves a fair amount of computation. But, in any event, on the basis of this admittedly small sample I would seriously doubt whether Benchmark is capable of telling me accurately whether the older or newer version of my module is faster.

I googled the archives at comp.lang.perl.modules on this, but couldn't come up with anything. I then supersearched the perlmonks archives; other peculiarities of Benchmark have been reported, but I couldn't find anything on this problem.

Which leads to these questions:

1. Have other users experienced similar problems?
2. Does anyone have an explanation for the subroutine being tested second to be, in 9 out of 10 cases, the slower running one?
3. Does anyone have a better way of benchmarking subroutines that entail a fair amount of calculation?

Thank you very much.
Jim Keenan

Comment on Benchmark.pm: Does subroutine testing order bias results?
Select or Download Code
Re: Benchmark.pm: Does subroutine testing order bias results?
by jlongino (Parson) on Jul 12, 2004 at 04:50 UTC
    I posted a Meditation several years ago and there were many interesting points about benchmarking presented (questions posed by the respondents).

    Some recommendations:

    • Try separating the two methods and use only timethis on each program. In each program vary what you are using the timethis on. E.G., once just like your listc subroutine is written. The next progrram might generate the data outside of the timethis block. Use timethis only on the computational snippet. If necessary, increase the number of data points, decrease the number of data points. Sometimes it is not obvious how a snippet will work when you tweak the characteristics of a data set. Some algorithms may work well on large dataset and poorly on smaller ones. Some may perform well with smaller data points than others.
    • Try not to include "my" statements in your timethis code. These portions of the code will add bloat to your results (bloat the timed elements and decrease the iterations). I'm not saying that you shouldn't have done it as you did above, just that it might be worthwhile doing it differently and analyzing the results. It is a "good thing" to benchmark it in a "real world" scenario.
    • Depending on how you intialize your data structures, you might need to be concerned about how your computer/OS caches data. Perform multiple iterations of each program (via a shell script or a batch file depending on your OS). It might even be worthwhile running a few programs in between that create large data sets. This might aid in preventing caching anomalies. I don't know how you generate your data set, so it may not apply to your situation.

      These are the things that come to mind immediately and I'm sure that there are many other things to consider as well. Good luck!

      "Some recommendations:
      "Try separating the two methods and use only timethis on each program. In each program vary what you are using the timethis on. E.G., once just like your listc subroutine is written."

      I am in the process of trying out your suggestion and, time permitting, I will post additional results in a few days. Thanks.

      Jim Keenan

Re: Benchmark.pm: Does subroutine testing order bias results?
by hossman (Prior) on Jul 12, 2004 at 05:45 UTC

    You haven't provided enough code for us to know what exactly it is you are benchmarking. In particular, how is your data initialized?

    I for one am suspicious that maybe each iteration is allocating a new set of arrays, which may be allocating more memory, which may be causing your benchmark to swap.

    I'm also wondering if the initialization of your arrays is deterministic, or if their are random values in those arrays that might be making the sort done by List::Compare take longer in some instances.

Re: Benchmark.pm: Does subroutine testing order bias results?
by BrowserUk (Pope) on Jul 12, 2004 at 06:18 UTC

    I think that your right in that under some circumstances, Benchmark seems to consistantly favor the first test run. If the code under test does a lot of allocation and deallocation, as I suspect (but don't know) that List::Compare does? Then the first test seems to run substantially quicker. The subsequent tests seem to get a fairly even performance.

    Here, the code tested is identical in all cases, but the first test run (Atest), consistantly comes out 23%-46% faster than the other (identical?) tests.

    #! perl -slw use strict; use Benchmark qw[ cmpthese ]; sub test { my @strings = map{ ' ' x 1000 } 1 .. 50_000; } cmpthese( 5, { Atest => \&test, Btest => \&test, Ctest => \&test, Dtest => \&test, }); __END__ s/iter Ctest Btest Dtest Atest Ctest 9.29 -- -0% -0% -31% Btest 9.29 0% -- -0% -31% Dtest 9.28 0% 0% -- -31% Atest 6.36 46% 46% 46% -- P:\test>373536-2 s/iter Btest Dtest Ctest Atest Btest 9.31 -- -0% -0% -19% Dtest 9.29 0% -- 0% -19% Ctest 9.29 0% 0% -- -19% Atest 7.54 23% 23% 23% -- s/iter Ctest Btest Dtest Atest Ctest 9.29 -- -0% -0% -31% Btest 9.29 0% -- -0% -31% Dtest 9.28 0% 0% -- -31% Atest 6.36 46% 46% 46% -- P:\test>373536-2 s/iter Btest Dtest Ctest Atest Btest 9.31 -- -0% -0% -19% Dtest 9.29 0% -- 0% -19% Ctest 9.29 0% 0% -- -19% Atest 7.54 23% 23% 23% --

    I have a tentative conclusion for why this might be, but the size of the difference shown by the benchmark seems too big for my thought to explain all of it. So, I'll keep my trap shut for a while and see what others think.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

      For what it's worth...

      Benchmark: timing 5 iterations of Atest, Btest, Ctest, Dtest... Atest: 4 wallclock secs ( 4.08 usr + 0.23 sys = 4.31 CPU) @ 1 +.16/s (n=5) Btest: 5 wallclock secs ( 4.18 usr + 0.00 sys = 4.18 CPU) @ 1 +.20/s (n=5) Ctest: 5 wallclock secs ( 4.17 usr + 0.00 sys = 4.17 CPU) @ 1 +.20/s (n=5) Dtest: 5 wallclock secs ( 4.21 usr + 0.00 sys = 4.21 CPU) @ 1 +.19/s (n=5) Rate Atest Dtest Btest Ctest Atest 1.16/s -- -2% -3% -3% Dtest 1.19/s 2% -- -1% -1% Btest 1.20/s 3% 1% -- -0% Ctest 1.20/s 3% 1% 0% --
      Update: possibly more important, the problem isn't exaserbated by increases iterations...
      laptop:~> monk.pl Benchmark: timing 20 iterations of Atest, Btest, Ctest, Dtest... Atest: 17 wallclock secs (16.61 usr + 0.25 sys = 16.86 CPU) @ 1 +.19/s (n=20) Btest: 16 wallclock secs (16.73 usr + 0.00 sys = 16.73 CPU) @ 1 +.20/s (n=20) Ctest: 17 wallclock secs (16.72 usr + 0.00 sys = 16.72 CPU) @ 1 +.20/s (n=20) Dtest: 17 wallclock secs (16.71 usr + 0.02 sys = 16.73 CPU) @ 1 +.20/s (n=20) Rate Atest Btest Dtest Ctest Atest 1.19/s -- -1% -1% -1% Btest 1.20/s 1% -- -0% -0% Dtest 1.20/s 1% 0% -- -0% Ctest 1.20/s 1% 0% 0% --
      System Info...
        For what it's worth...

        Actually, it worth quite a lot--to me at least. Thankyou greatly.

        Your post prompted me to try my benchmark with 5.6.1 and the difference is stark.

        C:\>perl5.6.1 p:\test\373536-2.pl [SNIP] Rate Btest Atest Ctest Dtest Btest 9.14/s -- -0% -3% -3% Atest 9.14/s 0% -- -3% -3% Ctest 9.40/s 3% 3% -- -0% Dtest 9.42/s 3% 3% 0% -- C:\>perl5.8.4 p:\test\373536-2.pl s/iter Btest Dtest Ctest Atest Btest 4.23 -- -0% -0% -18% Dtest 4.22 0% -- -0% -18% Ctest 4.22 0% 0% -- -18% Atest 3.45 22% 22% 22% --

        Not only is does the first-run bias disappear, but look at the iteration times! With 5.6.1, they are in iterations per second. For 5.8.4 thay are in seconds per iteration. If my math is up to it this morning, I make that 5.8.4 is 40x slower than 5.6.1!...

        That's not the end of the story. The 5.6.1 involved is home built from the CPAN distribution. The 5.8.4 involved is AS 810.

        I've royally screwed my home-build copy of 5.8.4 trying to get USE_PERL_MALLOC to work, so I haven't been able to test that yet, but I am going to re-unzip the lot and start from scratch.

        All of which is probably of not muct interest to you, but it has made me look again at various things. Again, thankyou.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      I ran a slightly generalized version of BrowserUk's test() of Benchmark.pm's cmpthese function on 4 different combinations of Perl and OS. From the data below I note:

      1. Across Perls and OSes, increasing the number of iterations in cmpthese() smoothed out the differences in results from the 4 tests in each call. Increasing the number of elements mapped to @strings by test() -- on the 2 OSes where I was able to test that -- didn't have a consistent effect in this regard.

      2. On 3 of the 4 Perl/OS combinations on which the test was run, the first run of the test within a given iteration ran more slowly -- albeit slightly -- than subsequent runs. Only on the Perl 5.8.0/Win2K combination was the first run faster than subsequent ones. This is the opposite of what I reported in my OP to this thread, where I found that, while comparing two different versions of a module, the first subroutine tested via timethese() ran faster than the second.

      It may be that this outcome is due to the type of work which the subroutine being benchmarked is called upon to do. In BrowserUk's test(), only 1 array is created with each iteration. In my OP, I was creating the heaviest version of a List::Compare object, which internally allocates more than a dozen arrays and hashes. But since this object goes out of scope and is presumably destroyed with each call to timethese(), it's not clear why this necessarily means that the second subroutine should run more slowly than the first.

      The plot thickens.

      sub test { my @strings = map{ ' ' x 1000 } 1 .. $records; } cmpthese( $iterations, { Atest => \&test, Btest => \&test, Ctest => \&test, Dtest => \&test, } ); __END__ # 1. BrowserUk's &test on Perl 5.8.0 on Win2K # Done 6 ways: # a. Testing 5 iterations of 25000 elements ... Rate Dtest Btest Ctest Atest Dtest 1.04/s -- -0% -0% -13% Btest 1.04/s 0% -- -0% -13% Ctest 1.05/s 0% 0% -- -13% Atest 1.20/s 15% 15% 15% -- # b. Testing 5 iterations of 50000 elements ... s/iter Dtest Btest Ctest Atest Dtest 3.66 -- -0% -0% -20% Btest 3.66 0% -- -0% -20% Ctest 3.66 0% 0% -- -20% Atest 2.93 25% 25% 25% -- # c. Testing 50 iterations of 25000 elements ... Rate Ctest Dtest Btest Atest Ctest 1.03/s -- -0% -0% -2% Dtest 1.03/s 0% -- -0% -2% Btest 1.04/s 0% 0% -- -2% Atest 1.06/s 2% 2% 2% -- # d. Testing 50 iterations of 50000 elements ... s/iter Dtest Ctest Btest Atest Dtest 3.19 -- -0% -0% -2% Ctest 3.18 0% -- -0% -2% Btest 3.18 0% 0% -- -2% Atest 3.12 2% 2% 2% -- # e. Testing 100 iterations of 25000 elements ... Rate Btest Atest Ctest Dtest Btest 1.03/s -- -0% -1% -1% Atest 1.04/s 0% -- -0% -1% Ctest 1.04/s 1% 0% -- -0% Dtest 1.04/s 1% 1% 0% -- # f. Testing 100 iterations of 50000 elements ... s/iter Dtest Ctest Btest Atest Dtest 5.71 -- -0% -0% -1% Ctest 5.71 0% -- -0% -1% Btest 5.69 0% 0% -- -1% Atest 5.65 1% 1% 1% -- # 2. BrowserUk's &test on Perl 5.8.4 on Darwin # Done 6 ways: # a. Testing 5 iterations of 25000 elements ... Rate Atest Dtest Btest Ctest Atest 2.49/s -- -4% -5% -6% Dtest 2.59/s 4% -- -1% -2% Btest 2.62/s 5% 1% -- -1% Ctest 2.65/s 6% 2% 1% -- # b. Testing 5 iterations of 50000 elements ... Rate Atest Btest Ctest Dtest Atest 1.22/s -- -6% -6% -7% Btest 1.29/s 6% -- -0% -1% Ctest 1.29/s 6% -0% -- -1% Dtest 1.30/s 7% 1% 1% -- # c. Testing 50 iterations of 25000 elements ... Rate Atest Dtest Btest Ctest Atest 2.59/s -- -1% -1% -1% Dtest 2.60/s 1% -- -0% -0% Btest 2.60/s 1% 0% -- -0% Ctest 2.61/s 1% 0% 0% -- # d. Testing 50 iterations of 50000 elements ... Rate Atest Btest Ctest Dtest Atest 1.28/s -- -0% -1% -1% Btest 1.29/s 0% -- -0% -0% Ctest 1.29/s 1% 0% -- -0% Dtest 1.29/s 1% 0% 0% -- # e. Testing 100 iterations of 25000 elements ... Rate Ctest Dtest Btest Atest Ctest 2.60/s -- -0% -0% -2% Dtest 2.60/s 0% -- -0% -2% Btest 2.61/s 0% 0% -- -2% Atest 2.65/s 2% 2% 2% -- # f. Testing 100 iterations of 50000 elements ... Rate Atest Btest Dtest Ctest Atest 1.29/s -- -0% -0% -0% Btest 1.29/s 0% -- -0% -0% Dtest 1.29/s 0% 0% -- -0% Ctest 1.29/s 0% 0% 0% -- # 3. BrowserUk's &test on Perl 5.6.0 on RedHat Linux 7.2 # Done 3 ways; was unable to do 50000 elements test due to # excessive swapping to disk # a. Testing 5 iterations of 25000 elements ... Rate Atest Btest Ctest Dtest Atest 1.80/s -- -4% -4% -4% Btest 1.87/s 4% -- -0% -0% Ctest 1.87/s 4% 0% -- 0% Dtest 1.87/s 4% 0% 0% -- # b. Testing 50 iterations of 25000 elements ... Rate Atest Btest Ctest Dtest Atest 1.86/s -- -0% -1% -1% Btest 1.87/s 0% -- -0% -0% Ctest 1.87/s 1% 0% -- 0% Dtest 1.87/s 1% 0% 0% -- # c. Testing 100 iterations of 25000 elements ... Rate Atest Dtest Ctest Btest Atest 1.86/s -- -0% -0% -0% Dtest 1.87/s 0% -- -0% -0% Ctest 1.87/s 0% 0% -- -0% Btest 1.87/s 0% 0% 0% -- # 4. BrowserUk's &test on Perl 5.6.1 on Windows98SE # Done 3 ways; was unable to do 50000 elements test due to # excessive swapping to disk # a. Testing 5 iterations of 25000 elements ... Rate Atest Dtest Btest Ctest Atest 2.02/s -- -9% -9% -11% Dtest 2.22/s 10% -- -0% -2% Btest 2.22/s 10% 0% -- -2% Ctest 2.27/s 12% 2% 2% -- # c. Testing 50 iterations of 25000 elements ... Rate Atest Dtest Ctest Btest Atest 2.20/s -- -1% -1% -1% Dtest 2.22/s 1% -- -0% -0% Ctest 2.22/s 1% 0% -- 0% Btest 2.22/s 1% 0% 0% -- # e. Testing 100 iterations of 25000 elements ... Rate Atest Dtest Ctest Btest Atest 2.19/s -- -1% -1% -2% Dtest 2.21/s 1% -- -0% -0% Ctest 2.21/s 1% 0% -- -0% Btest 2.22/s 2% 0% 0% --
      System Info

      The previous posting (beginning, "I ran a slightly more generalized ...") was mine. I forgot to log in.

        I'll reply here so you get the notification.

        The problem is not due to Benchmark.

        I speculated that the first-run bias for low numbers of benchmark iterations could be because on the very first iteration when the storage required by the benchmark is first allocated, it is 'virgin' memory.

        This is bit like having a empty disk drive; When you write the first few files to it, each one get continuous space that directly follows the last. No freespace chains need to be traversed. There is always enough space at the head of the free-space chain to allocate the next file to be written, because the head of the chain is the rest of the disk.

        On the second and subsequent runs, the space freed by the first run is now a chain of blocks of sizes that may need to be coallesced to fulfill any given request. The memory becomes fragmented much like disk drives do.

        Tye pointed out {placeholder for the link} that the MS C runtime malloc() imlementation is, um, sub-optimal for the way Perl uses memory. He suggested that I tried building Perl to use PERL_MALLOC, which is tailored to Perl's requirements. (Which AS builds do not use; maybe for good reason.)

        I attempted this and discovered that the Makefile will only allow you to use PERL_MALLOC if you disable USE_IMP_SYS which (though not stated in the Makefile hints), also precludes using USE_ITHREADS & USE_MULTIPLICITY.

        It turns out that Steve Hays was has persued a similar strategy and has posted a patch at perl-win32-porters that bypasses a problem in the Win32 sbrk() implementation and allows Perl to build with the combination of PERL_MALLOC, USE_IMP_SYS, USE_ITHREADS, USE_MULTIPLICITY.

        I also came up with a workaround, but Steve's is better than mine...and he is set up to produce patches properly whereas I am not.

        I've tried thrashing the patch applied to 5.8.4, with a test program -- basically running my Benchmark above on 10, 20 and 30 threads simultaneously and (so far) it appears to be stable. THIS IS NOT OFFICIAL. Just my findings on a single, 1-cpu box. It may not be compatible with multi-cpu boxes. It may be that this is not a good test.

        Steve also posted a [news://nntp.perl.org/perl.perl5.porters/93142|Smoke report] of the patch that show it failing with 'inconsistant' and 'dubious' result from a threads test in smoke test suite. Whether these are related to the patch or not, is not yet clear to me.

        The results I am seeing from running the Benchmark using PERL_MALLOC show not only a marked improvement in the consistency of the first to last runs. Now biased against the first run rather than for it; but only very slightly. It easily gets lost in higher numbers of iterations. It also performs *much* more quickly than the CRT malloc; around 5-7x faster. Not quite back to 5.6.1 performance, but a very definite improvement. The benchmark was constructed to highlight and exaserpate the bias and may be atypical of perl usage, but maybe not for what you are doing.

        Whether the PERL_MALLOC/USE_IMP_SYS/USE_ITHREADS combination is truely safe yet is still not clear--but progress has been made.

        If you can't or don't want to risk the transistion to building with the patch yet, but need to exclude the vargaries of this first run bias, the simplest expedient is to run the cmpthese/timethese twice in each run. The first time for a single iteration which will get the memory allocated and somewhat fragmented and discard those results. Then run the benchmark again (within the same program) for a largish number of iterations and use those figures.

        FYI: PERL_MALLOC in the Makefile shows up as -Dusemymaloc in the Smoke reports and perl -V banners. You may already know this, but it confused me for a while. But then, I'm easily confused.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: Benchmark.pm: Does subroutine testing order bias results?
by simonm (Vicar) on Jul 12, 2004 at 16:17 UTC
    Does the order in which Benchmark.pm tests various subroutines bias the results which Benchmark reports?

    I think that you and others have done a good job of documenting that it can.

    As a suggestion, it might be feasible to patch Benchmark.pm to get around this issue. Any change you make will produce a different set of results, and it's hard to say which one is "correct", but in practical terms one or another of these modes might help in investigating a specific timing issue developers sometimes encounter:

    • You could interleave the subroutine calls in some randomized order, rather than doing each one sequentially.
    • Alternately, you could fork and have each child only time one of the subroutines and then pass the data back to the parent for integration.
    • In cases in which increasing the process's memory allocation is an issue, it might help to start by performing a few extra iterations of each of the provided subroutines, and throwing the results out, before starting the real timing runs.

    Update: I implemented the second of these ideas as Exporter::Forking.

      I don't know that modifing Benchmark's cmpthese/timethese current behavior is neccessary, but a new method that supports interleaving might be usefull...

      package Benchmark; use List::Util qw(shuffle); sub interleavethese{ # based on timethese, but it merges the results from several small # iterations with the order shuffled each time. my($n, $iters, $alt, $style) = @_; die "usage: interleavethese(count, iters, { 'Name1'=>'code1', ... +}\n" unless ref $alt eq 'HASH'; my @names = sort keys %$alt; $style = "" unless defined $style; print "Benchmark: " unless $style eq 'none'; if ( $n > 0 ) { croak "non-integer loopcount $n, stopped" if int($n)<$n; print "timing $iters sets of $n iterations of" unless $style e +q 'none'; } else { print "running" unless $style eq 'none'; } print " ", join(', ',@names) unless $style eq 'none'; unless ( $n > 0 ) { my $for = n_to_for( $n ); print ", each for $iters iterations of at least $for CPU secon +ds" unless $style eq 'none'; } print "...\n" unless $style eq 'none'; my %results; for (my $i = 0; $i < $iters; $i++) { my @tasks = shuffle @names; foreach my $name (@tasks) { my $t = timethis ($n, $alt -> {$name}, $name, $style); $results{$name} = exists $results{$name} ? timesum($results{$name}, $t) : $t; } } return \%results; } package main; #use it like this... use Benchmark qw[ cmpthese interleasethese ]; cmpthese(interleavethese(5, 3, { Atest => \&test, Btest => \&test, Ctest => \&test, })); cmpthese(interleavethese(-5, 3, { Atest => \&test, Btest => \&test, Ctest => \&test, }));
      simonm: As a suggestion, it might be feasible to patch Benchmark.pm to get around this issue. ... You could fork and have each child only time one of the subroutines and then pass the data back to the parent for integration.

      Here is a hack which implement's simonm's suggestion, which was also made independently to me by Gary Benson of Perl Seminar New York over a fine Indian meal at Angon on East 6 Street in Manhattan.

      The hack involves three separate files and is probably modularizable, at least in part. To add/modify subroutines to be tested, add them to the third file below.

Re: Benchmark.pm: Does subroutine testing order bias results? (twice is nice)
by tye (Cardinal) on Jul 13, 2004 at 04:50 UTC

    I tend to write my uses of Benchmark.pm like:

    use Benchmark 'cmpthese'; # ... cmpthese( -3, { aSimple => \&Simple, bSimple => \&Simple, aShort => \&Short, bShort => \&Short, aSweet => \&Sweet, bSweet => \&Sweet, } );

    which prevents a performance bias based on execution order from going unnoticed. It also prevents me from getting excited about a 10% difference between Simple and Sweet when there is an 8% difference between aSweet and bSweet (which are the same code).

    I recall seeing that some platforms give the performance boost to the first item run while others give the boost to the last item run, though that recollection is from so long ago that I wouldn't put much stock in it; just something to look for...

    - tye        

      I had the thought that the extreme bais for the first run I demonstrated above (with large numbers of smallish allocations going on), might be due to the first run having 'virgin' heap to allocate from?

      On second and subsequent runs, the heap has a long chain of free blocks that have to be traversed and/or coallesed when allocating the memory.

      That seems to be born out by the fact that the bias reduces markedly with each extra iteration the first test completes.

      Am I venting hot air again?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

        I don't see that on FreeBSD.

        Win32's malloc() is particularly stupid that way. You trigger this problem in spades by allocating a bunch of identically-sized buffers that are larger than the typical allocation size and repeatedly freeing them then waiting then reallocating them.

        Win32's malloc() gets this case so amazingly wrong that it isn't even funny anymore. When you free the buffers, it leaves a bunch of fixed-size holes in the heap. Then, as tiny things get allocated, it takes a tiny chunk out of each hole (so broken it doesn't even take a bunch of tiny chunks out of the first hole before moving on to the second).

        Build a Perl that uses Perl's malloc and you'll likely not see this problem either.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://373536]
Approved by Old_Gray_Bear
Front-paged by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-12-28 04:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls