Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: sorting a complex multidimensional hash

by The Mad Hatter (Priest)
on Jul 22, 2004 at 02:25 UTC ( #376444=note: print w/replies, xml ) Need Help??

in reply to sorting a complex multidimensional hash

Write a custom sort routine (i.e. sort { ... } keys %myhash;) that splits the key values in $a and $b to get just the number and then compare those two values with <=>.

Silly me, this is of course inefficient (to some degree); I forgot all about the Schwartzian Transform. You'd probably be better off using it, as described below.

Replies are listed 'Best First'.
Re^2: sorting a complex multidimensional hash
by revdiablo (Prior) on Jul 22, 2004 at 06:12 UTC
    I forgot all about the Schwartzian Transform. You'd probably be better off using it

    I wouldn't jump to that conclusion without testing it first. The ST is a great optimization, but I have a feeling it is slightly overused. It's not very difficult to use, which is probably why it's used so often, but I rarely see anyone justify its use. If anything, a quick test or two would be in order before triumphantly declaring "You'd probably be better off using it."

      The Schwartzian Transform is useful primarily because of its scalability - when sorting, the number of calls to the comparator is somewhere between O(nlogn) and O(n2) [0], so if some of the effort can be offloaded to an O(n) comparison you would normally except it to be a win at some point as the array to be sorted grows.

      (Note though that this is not guaranteed, since the cost of the extra memory used by the ST does not grow linearly.)

      The GRT [1] has the added benefit that we end up using one of perl's built-in sort comparators (the simple $a cmp $b or $a <=> $b), which means that the entire sort is run directly in C code without resorting to the perl interpreter for each comparison.

      To my mind, there are only three reasons why you might not want to use one of these transforms on any sort in your program: if you are sorting a list of (small and) bounded size; if you are likely to be in a limited memory situation such that the transforms might cause you either to run out of memory or start swapping; or (and I think this is the most common case) if the added code complexity outweighs the likely speed advantages - never forget the high resource cost of code maintenance.


      [0] see An informal introduction to O(N) notation

      [1] see Resorting to Sorting for a tutorial on these and other sorting techniques

      That's why I included the "probably" in there; maybe I wasn't clear enough. I'm not usually one to jump on a bandwagon, and I'm aware that it's quite possible the ST wouldn't be better, but I decided to leave the benchmarks up to the OP (basically a lack of desire to do them myself).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://376444]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2018-04-26 08:15 GMT
Find Nodes?
    Voting Booth?