http://www.perlmonks.org?node_id=598663


in reply to Re^7: RFC:A brief tutorial on Perl's native sorting facilities.
in thread RFC:A brief tutorial on Perl's native sorting facilities.

I was just about to respond in the same way as jdporter, but I thought I'd run a benchmark first. To my surprise, the key sort is faster than the indexed GRT! Assuming my benchmark isn't screwed:

#! perl -slw use strict; use Date::Calc qw(Decode_Date_US); use Data::Dumper; use Benchmark qw[ cmpthese ]; sub convertdate { return sprintf '%04d%02d%02d', Decode_Date_US( $_[0] ) } our @list = ( { fname => 'Jan', lname => 'Krynicky', birth_date => 'Sep 3 1975', } +, { fname => 'Pavel', lname => 'Krynicky', birth_date => 'Dec 25 1969' +, }, { fname => 'Martin', lname => 'Krynicky', birth_date => 'Aug 24 1973 +', }, map{ { fname => $_, lname => 'Krynicky', birth_date => 'Jan 1 ' . ( 1900 + int( rand 100 ) ) } } 1 .. $ARGV[ 0 ] || 100 ); cmpthese -1, { ST => q[ my @sorted = map{ $_->[1] } sort{ $a->[0] <=> $b->[0] } map{ [ convertdate($_->{birth_date}), $_ ] } @list; ], KEY => q[ { my @keys = map convertdate($_->{birth_date}), @list; @sorted = @list[ sort {$keys[$a] cmp $keys[$b]} (0..$#list +) ]; } ], GRTish => q[ my @sorted = @list[ map { substr( $_, 8 ) } sort map { convertdate( $list[$_]->{birth_date} ) . $_ } 0 .. $#list ]; ], };

the keyed sort works out between 20 and 35% faster for this application.

s/iter ST GRTish KEY ST 1.67 -- -4% -24% GRTish 1.61 4% -- -21% KEY 1.27 32% 27% -- C:\test>junk9 1 Rate GRTish ST KEY GRTish 21914/s -- -10% -19% ST 24349/s 11% -- -10% KEY 27113/s 24% 11% -- C:\test>junk9 1e2 Rate GRTish ST KEY GRTish 986/s -- -1% -16% ST 997/s 1% -- -15% KEY 1179/s 20% 18% -- C:\test>junk9 1e3 Rate ST GRTish KEY ST 71.2/s -- -11% -25% GRTish 80.3/s 13% -- -16% KEY 95.1/s 34% 19% -- C:\test>junk9 1e4 Rate ST GRTish KEY ST 6.31/s -- -9% -23% GRTish 6.90/s 9% -- -16% KEY 8.23/s 30% 19% --

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^9: RFC:A brief tutorial on Perl's native sorting facilities.
by jdporter (Paladin) on Feb 07, 2007 at 01:19 UTC

    With only three items in the list, you're never going to see the performance advantage of the default sort; all the other effects dominate. Try it with 1_000 or 10_000 items.

    Update: Wups - I totally missed the fact that the number of items in the list is controlled by a run-time parameter. Sorry.

    So I did my own benchmark, and I found that what BrowserUk calls 'GRTish' was faster. Why? One difference between his benchmark and mine was the key calculation routine, convertdate. I didn't have Date::Calc installed, and didn't want to install it just for this test, so I made mine be simply

    sprintf "%20s", $_[0]
    Could that be the source of the difference in results?

    To check this possibility, I made my key calculation more expensive:

    sprintf '%20s', 2.2 / ( 2.2 / ( int( $_[0] * 8.00001 ) >> 3 ) )
    (That's a couple of chained math no-ops).

    Sure enough, KEY performed better than GRTish. And Decode_Date_US is, no doubt, a very costly function.

    Still — Why should the performance of this key calculation routine make such a difference? It's called exactly once per list item in both cases. And why is my expectation that the default sort (i.e. with no comparison block) should trounce other, non-default sorts completely unrealized here? Does sort have some nifty optimizations in 5.8?