Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Therefore, if I really want to sort on the hash values, it would make sense to create a new temporary hash in which the values and keys are swapped, and then sort that new array on its keys (which are the values of the original hash), rather than the values. Is that incorrect?

The answer, as usual, is "it depends". Of course extra hash lookups will slow things down, but by how much? Also (and this is where the "it depends" kicks in), in general you also have to factor in the time it'll take to construct a reverse hash.

Here is a very simple test:

#!/usr/bin/perl use strict; use warnings; use feature qw/say/; use Benchmark qw/cmpthese/; srand 0; our %hash = map { rand() } 1..1000; my $regular = sub { sort { $hash{$a} <=> $hash{$b} } keys %hash; }; my $keysonly = sub { sort { $a <=> $b } keys %hash; }; my $reverse_noref = sub { my %reverse_hash = (); foreach my $key (keys %hash) { $reverse_hash{$hash{$key}} = $key; } sort { $a <=> $b } keys %reverse_hash; }; my $reverse = sub { my %reverse_hash = (); foreach my $key (keys %hash) { push @{ $reverse_hash{$hash{$key}} }, $key; } sort { $a <=> $b } keys %reverse_hash; }; cmpthese(-2, { regular => $regular, keysonly => $keysonly, reverse => $reverse, reverse_noref => $reverse_noref });

On the machine I'm currently on, this produces:

$ perl 1099601.pl Rate reverse reverse_noref regular k +eysonly reverse 2222/s -- -35% -93% + -93% reverse_noref 3406/s 53% -- -89% + -89% regular 31693/s 1326% 830% -- + -0% keysonly 31726/s 1328% 831% 0% + -- $

So in isolation, the difference between "regular" (with the hash lookup) and "keysonly" (without) is negligible (though of course the latter is ever so slightly faster), while constructing a reverse hash first is 13 times slower. Pushing to an array if/when you can't guarantee values are unique punishes you further, but even without that (reverse_noref) you're still an order of magnitude slower.

What does that mean for you? If you construct a "reverse" hash as you go along, just like you'd construct a regular hash, there may not be much of a difference (or there may be; you'll have to check). If you already have a "regular" hash, just let sort do whatever it needs to do; the difference won't be as big if you only need to construct the reverse hash once and then access it many times, but it'll still be there.

As always, it's better to measure than to assume when it comes to optimization. It may well be that your approach is actually faster for your script and data, and if speed is crucial, then using a less "natural" approach that's faster is entirely fair.


In reply to Re^3: sorting hash of array of hashes by value by AppleFritter
in thread sorting hash of array of hashes by value by Special_K

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-03-29 10:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found