Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Multiple Sort on selected column

by huchister (Acolyte)
on Oct 18, 2012 at 17:56 UTC ( #999789=perlquestion: print w/ replies, xml ) Need Help??
huchister has asked for the wisdom of the Perl Monks concerning the following question:

Greeting, I am new to PERL, and I dont know how to implement this multiple sort key. I am trying to sort given array with given selected column. See below code
#!/usr/bin/perl use strict; my @employees = ( { FIRST => 'Bill', LAST => 'Gates', SALARY => 600000, AGE => 45 }, { FIRST => 'George', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Sally', LAST => 'Developer', SALARY => 55000, AGE => 29 }, { FIRST => 'Joe', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Steve', LAST => 'Ballmer', SALARY => 600000, AGE => 41 } ); sub seniority { $a->{FIRST} cmp $b->{FIRST} or $a->{AGE} <=> $b->{AGE} } my $sort = "FIRST, AGE"; my @sorts = split( ',', $sort ); my @ranked; for( @sorts ) { @ranked = sort seniority @employees; } foreach my $emp (@ranked) { print "$emp->{SALARY}\t$emp->{AGE}\t$emp->{FIRST}\t$emp->{LAST}\n" +; }
Let say I want to sort only the Last, and Salary. then I'd have to change {FIRST} or {AGE} value in this case. Also, What if I want to add $sort key, ex) "FIRST, AGE, LAST", I'd have to add method to seniority, but dont know how. This would be great example for multiple sort given key variables. I would try to use basic 'sort' method as possible, not modifying or inserting new Sort::Key, or any other. UPDATE : This problem was solved using kennethk's method. Please see below for additional info.

Comment on Multiple Sort on selected column
Download Code
Re: Multiple Sort on selected column
by kennethk (Monsignor) on Oct 18, 2012 at 18:24 UTC
    So, in order to implement your desired flexible sort, you need to pass information into your comparison routine. However, you cannot specify arguments to a sub passed to sort directly. There are two straight-forward solutions to this, in my mind. The first would be to use a closure about an array, which I expect would be less maintainable and look more magical under maintenance.

    The second, and better option, would be to wrap your comparison routine in an anonymous block, like:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @employees = ( { FIRST => 'Bill', LAST => 'Gates', SALARY => 600000, AGE => 45 }, { FIRST => 'George', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Sally', LAST => 'Developer', SALARY => 55000, AGE => 29 }, { FIRST => 'Joe', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Steve', LAST => 'Ballmer', SALARY => 600000, AGE => 41 } ); sub cmp_by { my $result = 0; for my $term (@_) { $result ||= $a->{$term} cmp $b->{$term}; } return $result; } print Dumper sort {cmp_by(qw(LAST FIRST))} @employees;

    The way I've written it, you pass an array of keys (see qw in Quote and Quote like Operators in perlop) to your comparison function. The comparison function cycles over that list each time sort calls the block wrapped around it, storing the first time the comparison doesn't yield 0 and returning the result.

    Side notes: The language is Perl and the interpreter is perl; there is no PERL. Also, note I've added warnings to my demo code, which can be a real life saver: see Use strict warnings and diagnostics or die.

    Update: Note, as thundergnat was kind enough to point out below (and Tanktalus via msg), SALARY and AGE are numeric fields, and should be sorted using the numeric comparison operator, <=>. The code above will sort improperly on numbers that are not the same length (so would appear to succeed on AGE). Of course, as I've noted below, his is not without difficulty...


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Thanks, this was exactly what I've needed.

      I agree with one small caveat. It might be better to change the cmp_by sub to do a numeric comparison in addition to the string comparison. With just the cmp, 600000 sorts after 55000 but 400000 would sort before. The following will sort numerically largest to smallest then alphabetically.

      sub cmp_by { my $result; for my $term (@_) { $result ||= +$b->{$term} <=> +$a->{$term} or $a->{$term} cmp $ +b->{$term}; } return $result; }
        Your point is, of course valid. However, you should test your code before running it. Your or is lower precedence than = (see Operator Precedence and Associativity). This is, for example, why the classic or die construct works so well. This also means your code will never actually sort by string - warnings would have told you that you are performing a comparison in void context. In addition, + is a no-op in unary operator context; you likely meant the numification ("Venus") operator, 0+. This will not suppress the string in numeric context warnings either, since you are still using a string in a numeric context.

        If you want to do auto-detection, you'd be better off using the Conditional Operator using looks_like_number from Scalar::Util:

        sub cmp_by { use Scalar::Util 'looks_like_number'; my $result = 0; for my $term (@_) { $result ||= looks_like_number($a) ? $a->{$term} <=> $b->{$term} : $a->{$term} cmp $b->{$term}; } return $result; }

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Multiple Sort on selected column
by Tanktalus (Canon) on Oct 18, 2012 at 19:34 UTC

    Why not Sort::Key? Is it just because it's on CPAN? (Yes, even you can use CPAN) If it's just for learning, that's one thing, but if that were the case, why use sort instead of rolling your own sorting routine? Ok, I have the answer for that: you don't want to learn the details of sorting, you just want to learn how to use sort. But to that I say that it's an arbitrary line, and, for productivity reasons, I'd suggest learning Sort::Key and Sort::Key::Multi/Sort::Key::Maker as your arbitrary line instead. So, instead of the idiomatic:

    @ranked = sort { $a->{FIRST} cmp $b->{FIRST} or $a->{LAST} cmp $b->{LA +ST} or $a->{AGE} <=> $b->{AGE} } @employees;
    (note that we're comparing ages with the <=> operator, not cmp, so that 9 and 19 will sort properly, using cmp will not cause these to sort properly) learning to use Sort::Key might be more useful. For example:
    #!/usr/bin/perl use strict; use warnings; use Sort::Key qw(multikeysorter); my @employees = ( { FIRST => 'Bill', LAST => 'Gates', SALARY => 600000, AGE => 45 }, { FIRST => 'George', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Sally', LAST => 'Developer', SALARY => 55000, AGE => 29 }, { FIRST => 'Joe', LAST => 'Tester', SALARY => 55000, AGE => 29 }, { FIRST => 'Steve', LAST => 'Ballmer', SALARY => 600000, AGE => 41 } ); my %types = ( FIRST => 'str', LAST => 'str', SALARY => 'int', AGE => 'int', ); my @sort = @ARGV; # not comma-separated, just for ease @sort = map uc, @sort; # so I don't have to hold the shift key down. my $sorter = multikeysorter( sub { my $A = $_; map { $A->{$_} } @sort +}, @types{@sort} ); my @ranked = $sorter->(@employees); foreach my $emp (@ranked) { print "$emp->{SALARY}\t$emp->{AGE}\t$emp->{FIRST}\t$emp->{LAST}\n" +; }
    And then, as an example:
    $ perl x.pl first age 600000 45 Bill Gates 55000 29 George Tester 55000 29 Joe Tester 55000 29 Sally Developer 600000 41 Steve Ballmer $ perl x.pl age first 55000 29 George Tester 55000 29 Joe Tester 55000 29 Sally Developer 600000 41 Steve Ballmer 600000 45 Bill Gates $ perl x.pl age last first 55000 29 Sally Developer 55000 29 George Tester 55000 29 Joe Tester 600000 41 Steve Ballmer 600000 45 Bill Gates
    Simple, huh? :-)

    Update: And for reversing, add the following line right after the my %types declaration:

    @types{map -$_, keys %types} = map -$_, values %types;
    This will create the key -FIRST with value -str. And then the my $sorter line becomes:
    my $sorter = multikeysorter( sub { my $A = $_; map { $A->{s/^-//r} } @ +sort }, @types{@sort} );
    This removes the - sign when looking up the value in the employee hash, but keeps the - sign when passing in the types to sort. And now?
    $ perl5.14.2 x.pl -age last first 600000 45 Bill Gates 600000 41 Steve Ballmer 55000 29 Sally Developer 55000 29 George Tester 55000 29 Joe Tester
    Note that the /r flag is new in 5.14.0, if you are running something older, you'll have to use:
    my $sorter = multikeysorter( sub { my $A = $_; map { (my $k = $_) =~ s +/^-//; $A->{$k} } @sort }, @types{@sort} );

      Wasn't exactly the answer I was looking for, but you shown me the alternative way to solve this which may come in handy in near future. thanks!
Re: Multiple Sort on selected column
by sundialsvc4 (Monsignor) on Oct 18, 2012 at 19:57 UTC

    I happen to like to write separate subroutines for sorting, especially if I am going to do several different sorts that are need to be identical ... in that case, it’s only one thing to maintain and nothing to keep in-sync.

    Multi-field comparison functions are made very easy by the <=> operator, (edit, see below) and ‘cmp’, both of which (as per perldoc perlop), “returns -1, 0, or 1 depending on whether the left argument is numerically or string-wise less than, equal to, or greater than the right argument.”   Combine this either of these with ||, which “performs a short-circuit logical-OR operation.   (“Short-circuit,” in the case of logical-OR, mans that if the left-hand part is determined to be “True,” evaluation of the right-hand part is omitted since “True OR anything == True.”)   Since any non-zero value is True, a series of <=> comparisons can be chained using || to produce the intended result.   It is clear at a glance what the code is doing.

    { $$a{'last_name'} cmp $$b{'last_name'} || $$a{'first_name'} cmp $$b{'first_name'} // evaluated only if 'last_ +name's are equal }

    Very Important Edit!   It has very graciously been pointed out to me that the <=> operator is numeric whereas the cmp operator is the string operator that, in the foregoing example, I of course should have used.   (The above example has been edited to include it.)   One of the serious “gotchas” of Perl, which I confess that I have never quite understood, is that there are two entirely-separate sets of comparison and relational operators:   one for strings, and another for non-strings.   The original code, which used the <=> operator, would have compiled and ran, but would not have produced the intended result.   Thank-you to whoever-it-was for pointing this error out to me.   It is a critical error indeed.   But, with the change as noted, the original premise once again holds.   (We can defer the question of “why does Perl do this” ... indefinitely.)

Re: Multiple Sort on selected column
by DrHyde (Prior) on Oct 19, 2012 at 10:38 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999789]
Approved by philipbailey
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2014-09-17 02:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (56 votes), past polls