Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^8: RFC:A brief tutorial on Perl's native sorting facilities.

by BrowserUk (Pope)
on Feb 07, 2007 at 00:36 UTC ( #598663=note: print w/ replies, xml ) Need Help??


in reply to Re^7: RFC:A brief tutorial on Perl's native sorting facilities.
in thread RFC:A brief tutorial on Perl's native sorting facilities.

I was just about to respond in the same way as jdporter, but I thought I'd run a benchmark first. To my surprise, the key sort is faster than the indexed GRT! Assuming my benchmark isn't screwed:

#! perl -slw use strict; use Date::Calc qw(Decode_Date_US); use Data::Dumper; use Benchmark qw[ cmpthese ]; sub convertdate { return sprintf '%04d%02d%02d', Decode_Date_US( $_[0] ) } our @list = ( { fname => 'Jan', lname => 'Krynicky', birth_date => 'Sep 3 1975', } +, { fname => 'Pavel', lname => 'Krynicky', birth_date => 'Dec 25 1969' +, }, { fname => 'Martin', lname => 'Krynicky', birth_date => 'Aug 24 1973 +', }, map{ { fname => $_, lname => 'Krynicky', birth_date => 'Jan 1 ' . ( 1900 + int( rand 100 ) ) } } 1 .. $ARGV[ 0 ] || 100 ); cmpthese -1, { ST => q[ my @sorted = map{ $_->[1] } sort{ $a->[0] <=> $b->[0] } map{ [ convertdate($_->{birth_date}), $_ ] } @list; ], KEY => q[ { my @keys = map convertdate($_->{birth_date}), @list; @sorted = @list[ sort {$keys[$a] cmp $keys[$b]} (0..$#list +) ]; } ], GRTish => q[ my @sorted = @list[ map { substr( $_, 8 ) } sort map { convertdate( $list[$_]->{birth_date} ) . $_ } 0 .. $#list ]; ], };

the keyed sort works out between 20 and 35% faster for this application.

s/iter ST GRTish KEY ST 1.67 -- -4% -24% GRTish 1.61 4% -- -21% KEY 1.27 32% 27% -- C:\test>junk9 1 Rate GRTish ST KEY GRTish 21914/s -- -10% -19% ST 24349/s 11% -- -10% KEY 27113/s 24% 11% -- C:\test>junk9 1e2 Rate GRTish ST KEY GRTish 986/s -- -1% -16% ST 997/s 1% -- -15% KEY 1179/s 20% 18% -- C:\test>junk9 1e3 Rate ST GRTish KEY ST 71.2/s -- -11% -25% GRTish 80.3/s 13% -- -16% KEY 95.1/s 34% 19% -- C:\test>junk9 1e4 Rate ST GRTish KEY ST 6.31/s -- -9% -23% GRTish 6.90/s 9% -- -16% KEY 8.23/s 30% 19% --

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^8: RFC:A brief tutorial on Perl's native sorting facilities.
Select or Download Code
Re^9: RFC:A brief tutorial on Perl's native sorting facilities.
by jdporter (Canon) on Feb 07, 2007 at 01:19 UTC

    With only three items in the list, you're never going to see the performance advantage of the default sort; all the other effects dominate. Try it with 1_000 or 10_000 items.

    Update: Wups - I totally missed the fact that the number of items in the list is controlled by a run-time parameter. Sorry.

    So I did my own benchmark, and I found that what BrowserUk calls 'GRTish' was faster. Why? One difference between his benchmark and mine was the key calculation routine, convertdate. I didn't have Date::Calc installed, and didn't want to install it just for this test, so I made mine be simply

    sprintf "%20s", $_[0]
    Could that be the source of the difference in results?

    To check this possibility, I made my key calculation more expensive:

    sprintf '%20s', 2.2 / ( 2.2 / ( int( $_[0] * 8.00001 ) >> 3 ) )
    (That's a couple of chained math no-ops).

    Sure enough, KEY performed better than GRTish. And Decode_Date_US is, no doubt, a very costly function.

    Still — Why should the performance of this key calculation routine make such a difference? It's called exactly once per list item in both cases. And why is my expectation that the default sort (i.e. with no comparison block) should trounce other, non-default sorts completely unrealized here? Does sort have some nifty optimizations in 5.8?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://598663]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-07-26 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls