Re^8: RFC:A brief tutorial on Perl's native sorting facilities.

in reply to Re^7: RFC:A brief tutorial on Perl's native sorting facilities.
in thread RFC:A brief tutorial on Perl's native sorting facilities.

I was just about to respond in the same way as jdporter, but I thought I'd run a benchmark first. To my surprise, the key sort is faster than the indexed GRT! Assuming my benchmark isn't screwed:

#! perl -slw
use strict;
use Date::Calc qw(Decode_Date_US);
use Data::Dumper;
use Benchmark qw[ cmpthese ];

sub convertdate {
    return sprintf '%04d%02d%02d', Decode_Date_US( $_[0] )
}

our @list = (
  { fname => 'Jan', lname => 'Krynicky', birth_date => 'Sep 3 1975', }
+,
  { fname => 'Pavel', lname => 'Krynicky', birth_date => 'Dec 25 1969'
+, },
  { fname => 'Martin', lname => 'Krynicky', birth_date => 'Aug 24 1973
+', },
  map{
      {
          fname => $_, lname => 'Krynicky',
          birth_date => 'Jan 1 ' . ( 1900 + int( rand 100 ) )
      }
  } 1 .. $ARGV[ 0 ] || 100
);

cmpthese -1, {
    ST => q[
        my @sorted = map{
            $_->[1]
        } sort{
           $a->[0] <=> $b->[0]
        } map{
            [ convertdate($_->{birth_date}), $_ ]
        } @list;
    ],
    KEY => q[
        {
            my @keys = map convertdate($_->{birth_date}), @list;
            @sorted = @list[ sort {$keys[$a] cmp $keys[$b]} (0..$#list
+) ];
        }
    ],
    GRTish => q[
        my @sorted = @list[
            map {
                substr( $_, 8 )
            } sort map {
                convertdate( $list[$_]->{birth_date} ) . $_
            } 0 .. $#list
        ];
    ],
};
[download]

the keyed sort works out between 20 and 35% faster for this application.

       s/iter     ST GRTish    KEY
ST       1.67     --    -4%   -24%
GRTish   1.61     4%     --   -21%
KEY      1.27    32%    27%     --

C:\test>junk9 1
          Rate GRTish     ST    KEY
GRTish 21914/s     --   -10%   -19%
ST     24349/s    11%     --   -10%
KEY    27113/s    24%    11%     --

C:\test>junk9 1e2
         Rate GRTish     ST    KEY
GRTish  986/s     --    -1%   -16%
ST      997/s     1%     --   -15%
KEY    1179/s    20%    18%     --

C:\test>junk9 1e3
         Rate     ST GRTish    KEY
ST     71.2/s     --   -11%   -25%
GRTish 80.3/s    13%     --   -16%
KEY    95.1/s    34%    19%     --

C:\test>junk9 1e4
         Rate     ST GRTish    KEY
ST     6.31/s     --    -9%   -23%
GRTish 6.90/s     9%     --   -16%
KEY    8.23/s    30%    19%     --
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re^8: RFC:A brief tutorial on Perl's native sorting facilities. Select or Download Code

Replies are listed 'Best First'.
Re^9: RFC:A brief tutorial on Perl's native sorting facilities. by jdporter (Paladin) on Feb 07, 2007 at 01:19 UTC
With only three items in the list, you're never going to see the performance advantage of the default sort; all the other effects dominate. Try it with 1_000 or 10_000 items. Update: Wups - I totally missed the fact that the number of items in the list is controlled by a run-time parameter. Sorry. So I did my own benchmark, and I found that what BrowserUk calls 'GRTish' was faster. Why? One difference between his benchmark and mine was the key calculation routine, `convertdate`. I didn't have Date::Calc installed, and didn't want to install it just for this test, so I made mine be simply `sprintf "%20s", $_[0]` [download] Could that be the source of the difference in results? To check this possibility, I made my key calculation more expensive: `sprintf '%20s', 2.2 / ( 2.2 / ( int( $_[0] * 8.00001 ) >> 3 ) )` [download] (That's a couple of chained math no-ops). Sure enough, KEY performed better than GRTish. And `Decode_Date_US` is, no doubt, a very costly function. Still — Why should the performance of this key calculation routine make such a difference? It's called exactly once per list item in both cases. And why is my expectation that the default sort (i.e. with no comparison block) should trounce other, non-default sorts completely unrealized here? Does sort have some nifty optimizations in 5.8?	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^9: RFC:A brief tutorial on Perl's native sorting facilities.
by jdporter (Paladin) on Feb 07, 2007 at 01:19 UTC

With only three items in the list, you're never going to see the performance advantage of the default sort; all the other effects dominate. Try it with 1_000 or 10_000 items.

Update: Wups - I totally missed the fact that the number of items in the list is controlled by a run-time parameter. Sorry.

So I did my own benchmark, and I found that what BrowserUk calls 'GRTish' was faster. Why? One difference between his benchmark and mine was the key calculation routine, convertdate. I didn't have Date::Calc installed, and didn't want to install it just for this test, so I made mine be simply

  sprintf "%20s", $_[0]
[download]

To check this possibility, I made my key calculation more expensive:

  sprintf '%20s', 2.2 / ( 2.2 / ( int( $_[0] * 8.00001 ) >> 3 ) )
[download]

Sure enough, KEY performed better than GRTish. And Decode_Date_US is, no doubt, a very costly function.

Still — Why should the performance of this key calculation routine make such a difference? It's called exactly once per list item in both cases. And why is my expectation that the default sort (i.e. with no comparison block) should trounce other, non-default sorts completely unrealized here? Does sort have some nifty optimizations in 5.8?

[reply]
[d/l]
[select]

In Section Meditations