Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

sort performance

by le (Friar)
on Sep 12, 2000 at 02:04 UTC ( [id://32000]=perlquestion: print w/replies, xml ) Need Help??

le has asked for the wisdom of the Perl Monks concerning the following question:

I have a big hash of hashes that looks like this:
%hash = ( foo => { this => '...', that => '...', here => '...', there => '...' }, bar => { this => '...', that => '...', here => '...', there => '...' } );
What would be the most performant way to sort keys %hash on the values of 'this', for example?

Currently, I do it with

for (sort { $hash{$a}->{this} <=> $hash{$b}->{this} } keys %hash)
Is there anything faster/better?

Replies are listed 'Best First'.
RE: sort performance
by Adam (Vicar) on Sep 12, 2000 at 02:16 UTC
    my @sortedkeys = sort { $hash{$a}->{this} cmp $hash{$b}->{this} } keys + %hash
    You don't need the foreach loop, otherwise your method is fine (that's how I always do it.)

    Update: Oh, my mistake... I see that @sortedkeys is the array for the foreach loop. So you are doing fine, although for fun we should try to come up with a bunch of methods and then benchmark them.

Re (tilly) 1: sort performance
by tilly (Archbishop) on Sep 12, 2000 at 04:01 UTC
    You could avoid repeated double-lookups with a Schwartzian sort:
    my @sorted_keys = map {$_->[0]} sort {$a->[1] cmp $b->[1]} map {[$_, $hash{$_}{this}]} keys %hash;
    (Read it bottom to top and it will make more sense.)

    This is more memory intensive, but doing the double lookup in the sort block means that it happens O(n * log(n)) times. Doing it in a map means it only happens n times. (Where n is the number of keys.)

      The 'Schwartzian Transform' is for expensive operations.
      A hash lookup is not an expensive operation, so just do a plain sort.
        You are right. I just benchmarked it. The overhead of the maps far outweighs the cost of the sort. In fact it is so slow I will have to ask p5p about it. Here is my test:
        use strict; use Benchmark; use vars qw(%hash %test); for (1..1000) { $hash{"key$_"}{this} = "num$_"; } %test = ( schwartz => sub { my @sorted = map $_->[0], sort {$a->[1] cmp $b->[1]} map [$_, $hash{$_}{this}], keys %hash; }, straight => sub { my @sorted = sort { $hash{$a}{this} <=> $hash{$b}{this} } keys %hash; }, stupid => sub { my @sorted = sort { my $foo = [$a, $hash{$a}{this}]; $hash{$a}{this} <=> $hash{$b}{this} } keys %hash; }, ); timethese (-1, \%test);
        Believe it or not, stupid is several times faster for me than schwartz. IMO there is simply no possible good reason for that!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://32000]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-07-21 13:08 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.