note
bart
You should include <code>cmp</code> in your benchmark. I think you'll find it slower than the built-in sort. That speed difference is the overhead of the callback sub.
<P>I expect that <code>cmp</code> in general is a far slower operation than <code><=></code>. The latter only takes one CPU instruction (plus the overhead of the interpreter), the former is a slow library call for many strings — especially if the strings you compare are identical, because now all characters in the strings have to be examined.
<P>And I think what you found is that the callback overhead for <code>sort</code> is still faster than the speed difference between these two ops.
<P>Enough blah blah. I've added them to your benchmark code, so it now looks like:
<code>
cmpthese(
$count,
{
f_owtdi => sub { @rfl = sort { $a - $b } @nfl },
i_owtdi => sub { @rin = sort { $a - $b } @nin },
f_sship => sub { @rfl = sort { $a <=> $b } @nfl },
i_sship => sub { @rin = sort { $a <=> $b } @nin },
f_alpha => sub { @sfl = sort @afl },
i_alpha => sub { @sin = sort @ain },
f_cmp => sub { @sfl = sort { $a cmp $b } @afl },
i_cmp => sub { @sin = sort { $a cmp $b } @ain },
});
</code>
<P>I'll update with the results in a moment. This benchmark takes many minutes to run, and I don't want to skew the results by doing heavy stuff with my humble PC.
<P><b>Update:</b>
I'm back. Here are the results:
<code>
Benchmark: timing 300 iterations of f_alpha, f_cmp, f_owtdi, f_sship, i_alpha, i_cmp, i_owtdi, i_sship...
f_alpha: 26 wallclock secs (25.92 usr + 0.00 sys = 25.92 CPU) @ 11.57/s (n=300)
f_cmp: 26 wallclock secs (25.87 usr + 0.00 sys = 25.87 CPU) @ 11.60/s (n=300)
f_owtdi: 37 wallclock secs (36.58 usr + 0.00 sys = 36.58 CPU) @ 8.20/s (n=300)
f_sship: 17 wallclock secs (16.92 usr + 0.00 sys = 16.92 CPU) @ 17.73/s (n=300)
i_alpha: 21 wallclock secs (20.76 usr + 0.00 sys = 20.76 CPU) @ 14.45/s (n=300)
i_cmp: 20 wallclock secs (20.82 usr + 0.00 sys = 20.82 CPU) @ 14.41/s (n=300)
i_owtdi: 35 wallclock secs (35.53 usr + 0.00 sys = 35.53 CPU) @ 8.44/s (n=300)
i_sship: 16 wallclock secs (15.87 usr + 0.00 sys = 15.87 CPU) @ 18.90/s (n=300)
Rate f_owtdi i_owtdi f_alpha f_cmp i_cmp i_alpha f_sship i_sship
f_owtdi 8.20/s -- -3% -29% -29% -43% -43% -54% -57%
i_owtdi 8.44/s 3% -- -27% -27% -41% -42% -52% -55%
f_alpha 11.6/s 41% 37% -- -0% -20% -20% -35% -39%
f_cmp 11.6/s 41% 37% 0% -- -20% -20% -35% -39%
i_cmp 14.4/s 76% 71% 24% 24% -- -0% -19% -24%
i_alpha 14.5/s 76% 71% 25% 25% 0% -- -18% -24%
f_sship 17.7/s 116% 110% 53% 53% 23% 23% -- -6%
i_sship 18.9/s 130% 124% 63% 63% 31% 31% 7% --
</code>
That's odd. there is NO speed difference between [fi]_cmp and [fi]_alpha. That means that the cost of the callback is negligable... unless Perl is optimizing the callback away?
<P>Just to make sure, I've also swapped $a and $b in my callback sub. It doesn't make a difference.
<P>Conclusion: the speed gain you get by using numerical sort, is entirely due to the speed difference between the ops <code><=></code> and <code>cmp</code>.
195857
195857