Multiple Sort on selected column

huchister has asked for the wisdom of the Perl Monks concerning the following question:

Greeting, I am new to PERL, and I dont know how to implement this multiple sort key. I am trying to sort given array with given selected column. See below code

#!/usr/bin/perl

use strict;


my @employees = ( 
    { FIRST => 'Bill',   LAST => 'Gates',     
      SALARY => 600000, AGE => 45 },
    { FIRST => 'George', LAST => 'Tester',
      SALARY =>  55000, AGE => 29 },
    { FIRST => 'Sally',  LAST => 'Developer', 
      SALARY =>  55000, AGE => 29 },
    { FIRST => 'Joe',    LAST => 'Tester',    
      SALARY =>  55000, AGE => 29 },
    { FIRST => 'Steve',  LAST => 'Ballmer',   
      SALARY => 600000, AGE => 41 } 
);

sub seniority {
    $a->{FIRST} cmp $b->{FIRST} or $a->{AGE} <=> $b->{AGE}
}

my $sort = "FIRST, AGE";
my @sorts = split( ',', $sort );
my @ranked;

    for( @sorts ) 
    { 
    @ranked = sort seniority @employees;
    }
foreach my $emp (@ranked) {
    print "$emp->{SALARY}\t$emp->{AGE}\t$emp->{FIRST}\t$emp->{LAST}\n"
+;
}
[download]

Let say I want to sort only the Last, and Salary. then I'd have to change {FIRST} or {AGE} value in this case. Also, What if I want to add $sort key, ex) "FIRST, AGE, LAST", I'd have to add method to seniority, but dont know how. This would be great example for multiple sort given key variables. I would try to use basic 'sort' method as possible, not modifying or inserting new Sort::Key, or any other. UPDATE : This problem was solved using kennethk's method. Please see below for additional info.

Comment on Multiple Sort on selected column Download Code

Replies are listed 'Best First'.
Re: Multiple Sort on selected column by Tanktalus (Canon) on Oct 18, 2012 at 19:34 UTC
Why not Sort::Key? Is it just because it's on CPAN? (Yes, even you can use CPAN) If it's just for learning, that's one thing, but if that were the case, why use sort instead of rolling your own sorting routine? Ok, I have the answer for that: you don't want to learn the details of sorting, you just want to learn how to use sort. But to that I say that it's an arbitrary line, and, for productivity reasons, I'd suggest learning Sort::Key and Sort::Key::Multi/Sort::Key::Maker as your arbitrary line instead. So, instead of the idiomatic: `@ranked = sort { $a->{FIRST} cmp $b->{FIRST} or $a->{LAST} cmp $b->{LA +ST} or $a->{AGE} <=> $b->{AGE} } @employees;` [download] (note that we're comparing ages with the `<=>` operator, not `cmp`, so that 9 and 19 will sort properly, using cmp will not cause these to sort properly) learning to use Sort::Key might be more useful. For example: #!/usr/bin/perl use strict; use warnings; use Sort::Key qw(multikeysorter); my @employees = ( { FIRST => 'Bill', LAST => 'Gates', SALARY => 600000, AGE => 45 }, { FIRST => 'George', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Sally', LAST => 'Developer', SALARY => 55000, AGE => 29 }, { FIRST => 'Joe', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Steve', LAST => 'Ballmer', SALARY => 600000, AGE => 41 } ); my %types = ( FIRST => 'str', LAST => 'str', SALARY => 'int', AGE => 'int', ); my @sort = @ARGV; # not comma-separated, just for ease @sort = map uc, @sort; # so I don't have to hold the shift key down. my $sorter = multikeysorter( sub { my $A = $_; map { $A->{$_} } @sort +}, @types{@sort} ); my @ranked = $sorter->(@employees); foreach my $emp (@ranked) { print "$emp->{SALARY}\t$emp->{AGE}\t$emp->{FIRST}\t$emp->{LAST}\n" +; } [download] And then, as an example: `$ perl x.pl first age 600000 45 Bill Gates 55000 29 George Tester 55000 29 Joe Tester 55000 29 Sally Developer 600000 41 Steve Ballmer $ perl x.pl age first 55000 29 George Tester 55000 29 Joe Tester 55000 29 Sally Developer 600000 41 Steve Ballmer 600000 45 Bill Gates $ perl x.pl age last first 55000 29 Sally Developer 55000 29 George Tester 55000 29 Joe Tester 600000 41 Steve Ballmer 600000 45 Bill Gates` [download] Simple, huh? :-) Update: And for reversing, add the following line right after the my %types declaration: `@types{map -$_, keys %types} = map -$_, values %types;` [download] This will create the key -FIRST with value -str. And then the my $sorter line becomes: `my $sorter = multikeysorter( sub { my $A = $_; map { $A->{s/^-//r} } @ +sort }, @types{@sort} );` [download] This removes the - sign when looking up the value in the employee hash, but keeps the - sign when passing in the types to sort. And now? `$ perl5.14.2 x.pl -age last first 600000 45 Bill Gates 600000 41 Steve Ballmer 55000 29 Sally Developer 55000 29 George Tester 55000 29 Joe Tester` [download] Note that the /r flag is new in 5.14.0, if you are running something older, you'll have to use: `my $sorter = multikeysorter( sub { my $A = $_; map { (my $k = $_) =~ s +/^-//; $A->{$k} } @sort }, @types{@sort} );` [download]	[reply] [d/l] [select]
Re^2: Multiple Sort on selected column by huchister (Acolyte) on Oct 19, 2012 at 14:05 UTC
Wasn't exactly the answer I was looking for, but you shown me the alternative way to solve this which may come in handy in near future. thanks!	[reply]
Re: Multiple Sort on selected column by kennethk (Abbot) on Oct 18, 2012 at 18:24 UTC
So, in order to implement your desired flexible sort, you need to pass information into your comparison routine. However, you cannot specify arguments to a sub passed to sort directly. There are two straight-forward solutions to this, in my mind. The first would be to use a closure about an array, which I expect would be less maintainable and look more magical under maintenance. The second, and better option, would be to wrap your comparison routine in an anonymous block, like: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @employees = ( { FIRST => 'Bill', LAST => 'Gates', SALARY => 600000, AGE => 45 }, { FIRST => 'George', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Sally', LAST => 'Developer', SALARY => 55000, AGE => 29 }, { FIRST => 'Joe', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Steve', LAST => 'Ballmer', SALARY => 600000, AGE => 41 } ); sub cmp_by { my $result = 0; for my $term (@_) { $result \|\|= $a->{$term} cmp $b->{$term}; } return $result; } print Dumper sort {cmp_by(qw(LAST FIRST))} @employees; [download] The way I've written it, you pass an array of keys (see `qw` in Quote and Quote like Operators in perlop) to your comparison function. The comparison function cycles over that list each time `sort` calls the block wrapped around it, storing the first time the comparison doesn't yield `0` and returning the result. Side notes: The language is Perl and the interpreter is perl; there is no PERL. Also, note I've added warnings to my demo code, which can be a real life saver: see Use strict warnings and diagnostics or die. Update: Note, as thundergnat was kind enough to point out below (and Tanktalus via msg), `SALARY` and `AGE` are numeric fields, and should be sorted using the numeric comparison operator, `<=>`. The code above will sort improperly on numbers that are not the same length (so would appear to succeed on `AGE`). Of course, as I've noted below, his is not without difficulty... #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^2: Multiple Sort on selected column by thundergnat (Deacon) on Oct 18, 2012 at 19:31 UTC
I agree with one small caveat. It might be better to change the cmp_by sub to do a numeric comparison in addition to the string comparison. With just the cmp, 600000 sorts after 55000 but 400000 would sort before. The following will sort numerically largest to smallest then alphabetically. `sub cmp_by { my $result; for my $term (@_) { $result \|\|= +$b->{$term} <=> +$a->{$term} or $a->{$term} cmp $ +b->{$term}; } return $result; }` [download]	[reply] [d/l]
Re^3: Multiple Sort on selected column by kennethk (Abbot) on Oct 19, 2012 at 15:07 UTC
Your point is, of course valid. However, you should test your code before running it. Your `or` is lower precedence than = (see Operator Precedence and Associativity). This is, for example, why the classic `or die` construct works so well. This also means your code will never actually sort by string - warnings would have told you that you are performing a comparison in void context. In addition, `+` is a no-op in unary operator context; you likely meant the numification ("Venus") operator, `0+`. This will not suppress the string in numeric context warnings either, since you are still using a string in a numeric context. If you want to do auto-detection, you'd be better off using the Conditional Operator using `looks_like_number` from Scalar::Util: `sub cmp_by { use Scalar::Util 'looks_like_number'; my $result = 0; for my $term (@_) { $result \|\|= looks_like_number($a) ? $a->{$term} <=> $b->{$term} : $a->{$term} cmp $b->{$term}; } return $result; }` [download] #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^2: Multiple Sort on selected column by huchister (Acolyte) on Oct 18, 2012 at 18:45 UTC
Thanks, this was exactly what I've needed.	[reply]
Re: Multiple Sort on selected column by DrHyde (Prior) on Oct 19, 2012 at 10:38 UTC
You want Sort::MultipleFields.	[reply]
Re: Multiple Sort on selected column by sundialsvc4 (Abbot) on Oct 18, 2012 at 19:57 UTC
I happen to like to write separate subroutines for sorting, especially if I am going to do several different sorts that are need to be identical ... in that case, it’s only one thing to maintain and nothing to keep in-sync. Multi-field comparison functions are made very easy by the `<=>` operator, (edit, see below) and ‘cmp’, both of which (as per perldoc perlop), “returns -1, 0, or 1 depending on whether the left argument is numerically or string-wise less than, equal to, or greater than the right argument.” Combine ~~this~~ either of these with `\|\|`, which “performs a short-circuit logical-OR operation. (“Short-circuit,” in the case of logical-OR, mans that if the left-hand part is determined to be “True,” evaluation of the right-hand part is omitted since “True OR anything == True.”) Since any non-zero value is True, a series of `<=>` comparisons can be chained using `\|\|` to produce the intended result. It is clear at a glance what the code is doing. `{ $$a{'last_name'} cmp $$b{'last_name'} \|\| $$a{'first_name'} cmp $$b{'first_name'} // evaluated only if 'last_ +name's are equal }` [download] *Very Important Edit!* It has very graciously been pointed out to me that the `<=>` operator is numeric whereas the `cmp` operator is the string operator that, in the foregoing example, I of course should have used. (The above example has been edited to include it.) One of the serious “gotchas” of Perl, which I confess that I have never quite understood, is that there are two entirely-separate sets of comparison and relational operators: one for strings, and another for non-strings. The original code, which used the `<=>` operator, would have compiled and ran, but would not have produced the intended result. Thank-you to whoever-it-was for pointing this error out to me. It is a critical error indeed. But, with the change as noted, the original premise once again holds. (We can defer the question of “why does Perl do this” ... indefinitely.)

Back to Seekers of Perl Wisdom