Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

sorting a complex multidimensional hash

by envirodug (Novice)
on Jul 22, 2004 at 02:13 UTC ( #376443=perlquestion: print w/ replies, xml ) Need Help??
envirodug has asked for the wisdom of the Perl Monks concerning the following question:

Howdy all-

Quick question for the monks ... I have a hash that is basically setup as follows:
%myhash = ( 'aNumber.aCharacters' => { 'bNumber.bCharacters' => { 'one' => $valA, 'two' => $valB, 'three' => $valC, 'four' => $valD, 'five' => $valE, 'six' => $valF } } );
I can print information I wish by making a call such as:
print $myhash{'1.AB'}{'2.AB'}{'one'}, "\n";
What I'd like to be able to do is sort the hash by the first key (aNumber.aCharacters in the first example ... 1.AB in the second print example). If possible, I would then like to sort by the second key ... then I would print/use the rest of the availble information ('one','two',etc...).

My problem is that I have many random possibilties for the first and second hash keys ... they could be anything similar to the following:
1.AB 121.ABC1 52.ABC2
I would like the sort to be 'by number' of the digits before the period. I'm not sure a simple 'cmp' or '<=>' will do the trick ... and I can't figure out what to do. I searched for 'multidimensional hash sorting' on this site, but none of the questions really covered this question.

So, I suppose the short version of this question: how can I sort the above hash using the digits before the period of the first two hash keys? :)

Comment on sorting a complex multidimensional hash
Select or Download Code
Re: sorting a complex multidimensional hash
by The Mad Hatter (Priest) on Jul 22, 2004 at 02:25 UTC
    Write a custom sort routine (i.e. sort { ... } keys %myhash;) that splits the key values in $a and $b to get just the number and then compare those two values with <=>.

    Silly me, this is of course inefficient (to some degree); I forgot all about the Schwartzian Transform. You'd probably be better off using it, as described below.

      I forgot all about the Schwartzian Transform. You'd probably be better off using it

      I wouldn't jump to that conclusion without testing it first. The ST is a great optimization, but I have a feeling it is slightly overused. It's not very difficult to use, which is probably why it's used so often, but I rarely see anyone justify its use. If anything, a quick test or two would be in order before triumphantly declaring "You'd probably be better off using it."

        The Schwartzian Transform is useful primarily because of its scalability - when sorting, the number of calls to the comparator is somewhere between O(nlogn) and O(n2) [0], so if some of the effort can be offloaded to an O(n) comparison you would normally except it to be a win at some point as the array to be sorted grows.

        (Note though that this is not guaranteed, since the cost of the extra memory used by the ST does not grow linearly.)

        The GRT [1] has the added benefit that we end up using one of perl's built-in sort comparators (the simple $a cmp $b or $a <=> $b), which means that the entire sort is run directly in C code without resorting to the perl interpreter for each comparison.

        To my mind, there are only three reasons why you might not want to use one of these transforms on any sort in your program: if you are sorting a list of (small and) bounded size; if you are likely to be in a limited memory situation such that the transforms might cause you either to run out of memory or start swapping; or (and I think this is the most common case) if the added code complexity outweighs the likely speed advantages - never forget the high resource cost of code maintenance.

        Hugo

        [0] see An informal introduction to O(N) notation

        [1] see Resorting to Sorting for a tutorial on these and other sorting techniques

        That's why I included the "probably" in there; maybe I wasn't clear enough. I'm not usually one to jump on a bandwagon, and I'm aware that it's quite possible the ST wouldn't be better, but I decided to leave the benchmarks up to the OP (basically a lack of desire to do them myself).
Re: sorting a complex multidimensional hash
by thor (Priest) on Jul 22, 2004 at 02:37 UTC
    Doing the naïve thing does the right thing, albeit with warnings:
    use warnings; use diagnostics; use strict; my @array = ("8.foo", "6.bar", "7.baz", "5.biz", "3.fizzle", "0.fro", "9.boz"); print join "\n", sort {$a <=> $b} @array; __END__ Argument "8.foo" isn't numeric in sort at 376443.pl line 6 (#1) (W numeric) The indicated string was fed as an argument to an oper +ator that expected a numeric value instead. If you're fortunate the me +ssage will identify which operator was so unfortunate. Argument "6.bar" isn't numeric in sort at 376443.pl line 6 (#1) Argument "7.baz" isn't numeric in sort at 376443.pl line 6 (#1) Argument "5.biz" isn't numeric in sort at 376443.pl line 6 (#1) Argument "3.fizzle" isn't numeric in sort at 376443.pl line 6 (#1) Argument "0.fro" isn't numeric in sort at 376443.pl line 6 (#1) Argument "9.boz" isn't numeric in sort at 376443.pl line 6 (#1) 0.fro 3.fizzle 5.biz 6.baz 7.bar 8.foo 9.boz
    However, if you want to dispense with the warnings, you could use the "Schwarzian Transform" like this:
    use warnings; use diagnostics; use strict; my @array = ("8.foo", "6.bar", "7.baz", "5.biz", "3.fizzle", "0.fro", "9.boz"); print join "\n", map {$_->[0]} sort {$a->[1] <=> $b->[1]} map { [$_, by_number($_)] } @array; sub by_number { my $value = shift; if ($value =~ m/^(\d+)/) { return $1 } }

    thor

      many thanks (for the code & quick replies)! i think i have the "Schwarzian Transform" example working, i just need to throw a few test cases at it to make sure i'm feeding the correct input through.

      There's another option that falls somewhere in the middle or your two examples:

      use warnings; use diagnostics; use strict; my @array = ("8.foo", "6.bar", "7.baz", "5.biz", "3.fizzle", "0.fro", "9.boz"); print join "\n", sort {by_number($a) <=> by_number($b)} @array; sub by_number { my $value = shift; if ($value =~ m/^(\d+)/) { return $1 } }

      Now, of course this is going to be slower than the ST, but whether the optimization is necessary or not depends on a lot of things. Testing on my machine with an array of 70,000 elements, the ST takes 1.6 seconds and this one takes 5. That makes the ST 3.3x faster, but the increase might or might not be worth the more complex code. 5 seconds to sort 70,000 elements is not too shabby, and if the actual data is going to have far fewer, then the extra complexity just might not be worth it.

        That's fair. I suppose I've seen the "map-sort-map" construct enough times that I don't even blink at it. Heck, as long as we're on the subject, I may as well mention the only downfall (that I know) of the ST. As with a lot of things perl, you end up trading memory for speed. For every element in your initial list, you create an anonymous array of two elements. For large* lists, this can actually be prohibitive.

        thor


        For certain values of "large" :)
Re: sorting a complex multidimensional hash
by tilly (Archbishop) on Jul 22, 2004 at 02:45 UTC
    fast, flexible, stable sort gives a good way of sorting strings in the way that you want. That reduces this to being a problem of producing the list to sort.

    Try this:

    my @list = map {my $first = $_; map {"$first|$_"} keys %{$myhash{$firs +t}}} keys %myhash; sub XFORM { # Extract the sort key from $_[0] and return it. # This will often be written in-line # rather than as a real subroutine. } my @sorted= @list[ map { unpack "N", substr($_,-4) } sort map { XFORM($list[$_]) . pack "N", $_ } 0..$#list ]; foreach my $key (@sorted) { my ($first, $second) = split /\|/, $key; print "$first: $second: $myhash{$first}{$second}{'one'}\n"; }
    (Untested code...)
Re: sorting a complex multidimensional hash
by BrowserUk (Pope) on Jul 22, 2004 at 03:15 UTC

    Also an ST, but it sorts both levels

    #! perl -slw use strict; use Data::Dumper; my $value = 1; ## Set up some test data my %hash = map{ $_ => { map{ $_ => $value++ } map{ my $digit = $_; map{ "$digit.$_" } 'A' .. 'C' } 1 .. 3 } } map{ my $digit = $_; map{ "$digit.$_" } 'A' .. 'C' } 1 .. 3; # print Dumper \%hash; ## Sort and print print for map{ "{$_->[ 0 ]}{$_->[ 1 ]} => $hash{ $_->[ 0 ] }{ $_->[ 1 ] }" } sort { $a->[ 2 ] <=> $b->[ 2 ] || $a->[ 3 ] cmp $b->[ 3 ] || $a->[ 4 ] <=> $b->[ 4 ] || $a->[ 5 ] cmp $b->[ 5 ] } map{ my $key1 = $_; my( $n1, $l1 ) = $key1 =~ m[(^\d+)\.(.*)]; map{ my( $n2, $l2 ) = $_ =~ m[(^\d+)\.(.*)]; [ $key1, $_, $n1, $l1, $n2, $l2 ] } keys %{ $hash{ $_ } } } keys %hash; __OUTPUT__ P:\test>junk {1.A}{1.A} => 1 {1.A}{1.B} => 2 {1.A}{1.C} => 3 {1.A}{2.A} => 4 {1.A}{2.B} => 5 {1.A}{2.C} => 6 {1.A}{3.A} => 7 {1.A}{3.B} => 8 {1.A}{3.C} => 9 {1.B}{1.A} => 10 {1.B}{1.B} => 11 {1.B}{1.C} => 12 {1.B}{2.A} => 13 {1.B}{2.B} => 14 {1.B}{2.C} => 15 {1.B}{3.A} => 16 {1.B}{3.B} => 17 {1.B}{3.C} => 18 {1.C}{1.A} => 19 {1.C}{1.B} => 20 {1.C}{1.C} => 21 {1.C}{2.A} => 22 {1.C}{2.B} => 23 {1.C}{2.C} => 24 {1.C}{3.A} => 25 {1.C}{3.B} => 26 {1.C}{3.C} => 27 {2.A}{1.A} => 28 {2.A}{1.B} => 29 {2.A}{1.C} => 30 {2.A}{2.A} => 31 {2.A}{2.B} => 32 {2.A}{2.C} => 33 {2.A}{3.A} => 34 {2.A}{3.B} => 35 {2.A}{3.C} => 36 {2.B}{1.A} => 37 {2.B}{1.B} => 38 {2.B}{1.C} => 39 {2.B}{2.A} => 40 {2.B}{2.B} => 41 {2.B}{2.C} => 42 {2.B}{3.A} => 43 {2.B}{3.B} => 44 {2.B}{3.C} => 45 {2.C}{1.A} => 46 {2.C}{1.B} => 47 {2.C}{1.C} => 48 {2.C}{2.A} => 49 {2.C}{2.B} => 50 {2.C}{2.C} => 51 {2.C}{3.A} => 52 {2.C}{3.B} => 53 {2.C}{3.C} => 54 {3.A}{1.A} => 55 {3.A}{1.B} => 56 {3.A}{1.C} => 57 {3.A}{2.A} => 58 {3.A}{2.B} => 59 {3.A}{2.C} => 60 {3.A}{3.A} => 61 {3.A}{3.B} => 62 {3.A}{3.C} => 63 {3.B}{1.A} => 64 {3.B}{1.B} => 65 {3.B}{1.C} => 66 {3.B}{2.A} => 67 {3.B}{2.B} => 68 {3.B}{2.C} => 69 {3.B}{3.A} => 70 {3.B}{3.B} => 71 {3.B}{3.C} => 72 {3.C}{1.A} => 73 {3.C}{1.B} => 74 {3.C}{1.C} => 75 {3.C}{2.A} => 76 {3.C}{2.B} => 77 {3.C}{2.C} => 78 {3.C}{3.A} => 79 {3.C}{3.B} => 80 {3.C}{3.C} => 81

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      good stuff! many thanks!

      Since there has been all the dicussion regarding ST -v- GRT, here's an multi-level GRT version. Runs about twice as fast.

      print for map{ substr( $_, 1+index( $_, '|' ) ) } sort map { my $key1 = $_; map { pack 'NA* NA* A1 A*', split( '\.', $key1 ), split( '\.', $_ ), '|', $hash{ $key1 }{ $_ } } keys %{ $hash{ $_ } }; } keys %hash;

      Just plug it into the test script above to see it run.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
        You may be giving perl a bad reputation! Terse? No, not at all!
        Seriously, I'm impressed but glad I don't have to maintain it!
        Twice as fast you say. Hmmm. Tempting...
        i can definately handle faster ...

        question: the output from the first example was formatted as following:  "{$_->[ 0 ]}{$_->[ 1 ]} => $hash{ $_->[ 0 ] }{ $_->[ 1 ] }"

        the output from the new example prints the 'values' of the hash: substr( $_, 1+index( $_, '|' ) ).

        i tried substituting "{$_->[ 0 ]}{$_->[ 1 ]} => $hash{ $_->[ 0 ] }{ $_->[ 1 ] }" from the first example into the second example where substr( $_, 1+index( $_, '|' ) ) was found, but i'm seeing an error: Can't use string ("") as an ARRAY ref while "strict refs" in use at ./test4.pl line 42

        would anyone know how I would need to format this code ({$_->[ 0 ]}{$_->[ 1 ]} => $hash{ $_->[ 0 ] }{ $_->[ 1 ] }) to work in the second example and print information formatted similar to the first example? (i know enough perl to 'get myself in trouble')

        again, many thanks for everyone's help

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://376443]
Approved by thor
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2014-09-17 04:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (57 votes), past polls