The biggest problem with the 'normal' recursive form is that it chews memory like it was an all-you-can-eat buffet. Try changing the 2000 items to 20,000 in your benchmark and listen to your harddisk go into a death spiral as your machine tries to find enough virtual ram. (Maybe higher if you've more ram than I).
None of the other versions suffer this effect and that's the greatest benefit of the goto and &rec forms.
That said, a rather simpler construction of the & form will, despite needing to copy the input list to compensate for it's destructive nature, beat the unrolled form to over 2000 items, (maybe higher if your memory allocator is more efficent than mine), and fairly substantially for smaller lists, which probably covers the vast majority of uses of max in the wild.
I excluded the 'normal' form because of the memory problem, and the goto form because it is so slow that it makes for silly percentages:
sub _r_max {
my $h = shift;
return $h unless @_;
$_[0] = $h if $h > $_[0];
&_r_max;
}
sub r_max{ _r_max( @{[ @_ ]} ) }
__END__
C:\test>junk3 -N=100
Rate amp unrolled r_max
amp 4755/s -- -35% -50%
unrolled 7275/s 53% -- -24%
r_max 9522/s 100% 31% --
C:\test>junk3 -N=2000
Rate amp unrolled r_max
amp 183/s -- -50% -51%
unrolled 363/s 98% -- -2%
r_max 372/s 103% 2% --
But of course, a simpler implementation of the iteratative form is also possible. Again, it does an initial copy of the input list to avoid destructive modifications to it. Despite this, it beats all the others quite easily upto half a million items:
sub _i_max {{
my $h = shift;
return $h unless @_;
$_[0] = $h if $h > $_[0];
redo;
}}
sub i_max{ _i_max( @{[ @_ ]} ) }
__END__
C:\test>junk3 -N=5e3
Rate amp r_max unrolled i_max
amp 70.4/s -- -48% -52% -73%
r_max 136/s 94% -- -7% -47%
unrolled 147/s 110% 8% -- -43%
i_max 257/s 265% 88% 74% --
C:\test>junk3 -N=5e4
Rate amp r_max unrolled i_max
amp 6.79/s -- -47% -53% -72%
r_max 12.7/s 88% -- -12% -47%
unrolled 14.5/s 113% 14% -- -39%
i_max 23.9/s 252% 87% 65% --
C:\test>junk3 -N=5e5
(warning: too few iterations for a reliable count)
Rate amp r_max unrolled i_max
amp 0.678/s -- -47% -53% -72%
r_max 1.27/s 88% -- -12% -47%
unrolled 1.44/s 112% 13% -- -39%
i_max 2.38/s 251% 87% 65% --
It seems to be able to achieve 65% faster than it's nearest rival even though in the last case, the copy pushed my machine into swapping other code out to acquire the required space. I was going to add a test for the size of the input list in the first level sub and switch to a non-copying alternative when the list was greater than whatever size where the copying would negate the performance it allowed, but I run out of virtual memory before that ever occurs.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|