Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

An esteemed monk last night mentioned a problem of sorting a list of people alphabetically, but so that people with a job title of "Manager" are floated to the top of the list. I'm sure we can all think of similar problems we've had. For instance, sorting task/todo items by date, but sorting high priority tasks first.

Anyhow, it was late and I mumbled something about part from List::MoreUtils and wondered off to the monastery sleeping quarters.

Here I'd like to compare three implementations. Firstly, the naive approach:

my @sorted = sort { if ($a->title =~ /Manager/ and not $b->title =~ /Manager/) { -1; } elsif ($b->title =~ /Manager/ and not $a->title =~ /Manager/) { 1; } else { $a->name cmp $b->name; } } @employees;

If A is a manager but not B, then A goes first. If B is a manager, but not A, then B goes first. Otherwise (both are managers, or neither are managers), then sort by name. Simple; easy to understand.

But what are the drawbacks? The regexp is used four times, resulting in a lot of duplication. OK, so we could say my $test = qr/Manager/ and stop repeating ourselves, but even so, the regexp is still being matched against quite a lot. In the case where A and B are both managers, the regexp is matched against four times within the block. And that happens for each pair of people compared.

OK, so actually we can use the return value of =~ (or !~) numerically and turn that to our advantage. A slightly subtler approach...

my @sorted = sort { ($a->title !~ /Manager/) - ($b->title !~ /Manager/) or $a->name cmp $b->name; } @employees;

Here the regexp match happens only twice within the block, which should improve the efficiency of the sorting.

Now let's take a look at what I was jabbering on about last night; using part to split the list into a list of managers and a list of non-managers. That way we can sort the lists separately just based on their names (quick and easy) and then join them together again.

use List::MoreUtils qw( part ); my @sorted = map { sort { $a->name cmp $b->name } @$_ } part { $_->title !~ /Manager/ } @employees;

It's perhaps a little less readable than the first, naive approach, but it has some nice features. The /Manager/ regexp isn't repeated, and is only matched once for each person.

For those who are interested, here's a Test::More script that proves all three approaches work...

use v5.14; use strict; use warnings; use Test::More; package Person { use Moo; has name => (is => 'ro'); has title => (is => 'ro'); } sub Person { my ($name, $title) = @_; Person::->new(name => $name, title => $title); } my @employees = ( Person("Eve", "Tech Support"), Person("Alice", "Sales Manager"), Person("Dave", "Janitor"), Person("Fred", "Receptionist"), Person("Bob", "Customer Services"), Person("Greg", "Finance Manager"), Person("Carol", "Marketing Manager"), ); my @correct = ( Person("Alice", "Sales Manager"), Person("Carol", "Marketing Manager"), Person("Greg", "Finance Manager"), Person("Bob", "Customer Services"), Person("Dave", "Janitor"), Person("Eve", "Tech Support"), Person("Fred", "Receptionist"), ); OBVIOUS_APPROACH: { my @sorted = sort { if ($a->title =~ /Manager/ and not $b->title =~ /Manager/) { -1; } elsif ($b->title =~ /Manager/ and not $a->title =~ /Manager/) +{ 1; } else { $a->name cmp $b->name; } } @employees; is_deeply(\@sorted, \@correct, 'Obvious approach'); } SUBTLER_APPROACH: { my @sorted = sort { ($a->title !~ /Manager/) - ($b->title !~ /Manager/) or $a->name cmp $b->name; } @employees; is_deeply(\@sorted, \@correct, 'Subtler approach'); } FUNCTIONAL_APPROACH: { use List::MoreUtils qw( part ); my @sorted = map { sort { $a->name cmp $b->name } @$_ } part { $_->title !~ /Manager/ } @employees; is_deeply(\@sorted, \@correct, 'Functional approach'); } done_testing();

Now as I've hinted; my gut feeling was that the final, functional approach should be faster. This is for two reasons:

  • The regexp match happens fewer times. Our regexp is pretty simple, but imagine we also needed to classify Directors as Managers; and the Chief Executive too. The regexp could quickly become more complicated.
  • The Perl sorting algorithm is O(N log N) which means that it's quicker to sort two lists of lengths a and b than it is to sort a single combined list of length a+b.

How much faster... well, I generated a random list of 200 employees with numeric names, and a random job title of either "Manager" or "Staff", and ran the results though Benchmark...

use v5.14; use strict; use warnings; use List::MoreUtils qw( part ); use Benchmark ':all'; package Person { use Moo; has name => (is => 'ro'); has title => (is => 'ro'); } my @employees = map { Person::->new( name => int(rand(999999)), title => int(rand(2))?'Staff':'Manager', ) } 0..200; sub obvious { my @sorted = sort { if ($a->title =~ /Manager/ and not $b->title =~ /Manager/) { -1; } elsif ($b->title =~ /Manager/ and not $a->title =~ /Manager/) +{ 1; } else { $a->name cmp $b->name; } } @employees; } sub subtle { my @sorted = sort { ($a->title !~ /Manager/) - ($b->title !~ /Manager/) or $a->name cmp $b->name; } @employees; } sub functional { my @sorted = map { sort { $a->name cmp $b->name } @$_ } part { $_->title !~ /Manager/ } @employees; } cmpthese(1000, { obvious => \&obvious, subtle => \&subtle, functional => \&functional, });

The results were impresssive...

            Rate    obvious     subtle functional
obvious    102/s         --       -26%       -70%
subtle     137/s        34%         --       -59%
functional 336/s       228%       144%         --

That's right; the functional approach shaves off 70% of the time taken to sort the list of employees!

So remember, for prioritised sorting, part is your friend!

UPDATE:, using Sort::Key's keysort in conjunction with the functional approach doubles its speed once again!

my @sorted = map { keysort { $_->name } @$_ } part { $_->title !~ /Manager/ } @employees;

keysort is an XS implementation of the Schwartzian transform.

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

In reply to Unusual sorting requirements; comparing three implementations. by tobyink

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others perusing the Monastery: (10)
    As of 2018-05-24 13:01 GMT
    Find Nodes?
      Voting Booth?