<?xml version="1.0" encoding="windows-1252"?>
<node id="1000586" title="Unusual sorting requirements; comparing three implementations." created="2012-10-24 05:41:04" updated="2012-10-24 05:41:04">
<type id="120">
perlmeditation</type>
<author id="757127">
tobyink</author>
<data>
<field name="doctext">
&lt;p&gt;&lt;a href="http://www.perlmonks.org/?node=thezip"&gt;An esteemed monk&lt;/a&gt; last night mentioned a problem of sorting a list of people alphabetically, but so that people with a job title of "Manager" are floated to the top of the list. I'm sure we can all think of similar problems we've had. For instance, sorting task/todo items by date, but sorting high priority tasks first.&lt;/p&gt;

&lt;p&gt;Anyhow, it was late and I mumbled something about &lt;c&gt;part&lt;/c&gt; from [mod://List::MoreUtils] and wondered off to the monastery sleeping quarters.&lt;/p&gt;

&lt;p&gt;Here I'd like to compare three implementations. Firstly, the naive approach:&lt;/p&gt;

&lt;code&gt;
my @sorted = sort {
	if ($a-&gt;title =~ /Manager/ and not $b-&gt;title =~ /Manager/) {
		-1;
	}
	elsif ($b-&gt;title =~ /Manager/ and not $a-&gt;title =~ /Manager/) {
		1;
	}
	else {
		$a-&gt;name cmp $b-&gt;name;
	}
} @employees;
&lt;/code&gt;

&lt;p&gt;If A is a manager but not B, then A goes first. If B is a manager, but not A, then B goes first. Otherwise (both are managers, or neither are managers), then sort by name. Simple; easy to understand.&lt;/p&gt;

&lt;p&gt;But what are the drawbacks? The regexp is used four times, resulting in a lot of duplication. OK, so we could say &lt;c&gt;my $test = qr/Manager/&lt;/c&gt; and stop repeating ourselves, but even so, the regexp is still being matched against quite a lot. In the case where A and B are both managers, the regexp is matched against four times within the block. And that happens for each pair of people compared.&lt;/p&gt;

&lt;p&gt;OK, so actually we can use the return value of &lt;c&gt;=~&lt;/c&gt; (or &lt;c&gt;!~&lt;/c&gt;) numerically and turn that to our advantage. A slightly subtler approach...&lt;/p&gt;

&lt;code&gt;
my @sorted = sort {
	($a-&gt;title !~ /Manager/) - ($b-&gt;title !~ /Manager/)
	or $a-&gt;name cmp $b-&gt;name;
} @employees;
&lt;/code&gt;

&lt;p&gt;Here the regexp match happens only twice within the block, which should improve the efficiency of the sorting.&lt;/p&gt;

&lt;p&gt;Now let's take a look at what I was jabbering on about last night; using &lt;c&gt;part&lt;/c&gt; to split the list into a list of managers and a list of non-managers. That way we can sort the lists separately just based on their names (quick and easy) and then join them together again.&lt;/p&gt;

&lt;code&gt;
use List::MoreUtils qw( part );

my @sorted = 
	map  { sort { $a-&gt;name cmp $b-&gt;name } @$_ }
	part { $_-&gt;title !~ /Manager/ }
	@employees;
&lt;/code&gt;

&lt;p&gt;It's perhaps a little less readable than the first, naive approach, but it has some nice features. The &lt;c&gt;/Manager/&lt;/c&gt; regexp isn't repeated, and is only matched once for each person.&lt;/p&gt;

&lt;p&gt;For those who are interested, here's a Test::More script that proves all three approaches work...&lt;/p&gt;

&lt;readmore&gt;
&lt;code&gt;
use v5.14;
use strict;
use warnings;
use Test::More;

package Person {
	use Moo;
	has name  =&gt; (is =&gt; 'ro');
	has title =&gt; (is =&gt; 'ro');
}

sub Person {
	my ($name, $title) = @_;
	Person::-&gt;new(name =&gt; $name, title =&gt; $title);
}

my @employees = (
	Person("Eve", "Tech Support"),
	Person("Alice", "Sales Manager"),
	Person("Dave", "Janitor"),
	Person("Fred", "Receptionist"),
	Person("Bob", "Customer Services"),
	Person("Greg", "Finance Manager"),
	Person("Carol", "Marketing Manager"),
);

my @correct = (
	Person("Alice", "Sales Manager"),
	Person("Carol", "Marketing Manager"),
	Person("Greg", "Finance Manager"),
	Person("Bob", "Customer Services"),
	Person("Dave", "Janitor"),
	Person("Eve", "Tech Support"),
	Person("Fred", "Receptionist"),
);

OBVIOUS_APPROACH: {
	
	my @sorted = sort {
		if ($a-&gt;title =~ /Manager/ and not $b-&gt;title =~ /Manager/) {
			-1;
		}
		elsif ($b-&gt;title =~ /Manager/ and not $a-&gt;title =~ /Manager/) {
			1;
		}
		else {
			$a-&gt;name cmp $b-&gt;name;
		}
	} @employees;

	is_deeply(\@sorted, \@correct, 'Obvious approach');
}

SUBTLER_APPROACH: {
	
	my @sorted = sort {
		($a-&gt;title !~ /Manager/) - ($b-&gt;title !~ /Manager/)
		or $a-&gt;name cmp $b-&gt;name;
	} @employees;

	is_deeply(\@sorted, \@correct, 'Subtler approach');
}

FUNCTIONAL_APPROACH: {
	use List::MoreUtils qw( part );
	
	my @sorted = 
		map  { sort { $a-&gt;name cmp $b-&gt;name } @$_ }
		part { $_-&gt;title !~ /Manager/ }
		@employees;

	is_deeply(\@sorted, \@correct, 'Functional approach');
}

done_testing();
&lt;/code&gt;
&lt;/readmore&gt;

&lt;p&gt;Now as I've hinted; my gut feeling was that the final, functional approach should be faster. This is for two reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The regexp match happens fewer times. Our regexp is pretty simple, but imagine we also needed to classify Directors as Managers; and the Chief Executive too. The regexp could quickly become more complicated.&lt;/li&gt;
&lt;li&gt;The Perl sorting algorithm is &lt;i&gt;O(N log N)&lt;/i&gt; which means that it's quicker to sort two lists of lengths &lt;i&gt;a&lt;/i&gt; and &lt;i&gt;b&lt;/i&gt; than it is to sort a single combined list of length &lt;i&gt;a+b&lt;/i&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How much faster... well, I generated a random list of 200 employees with numeric names, and a random job title of either "Manager" or "Staff", and ran the results though [mod://Benchmark]...&lt;/p&gt;

&lt;readmore&gt;
&lt;code&gt;
use v5.14;
use strict;
use warnings;
use List::MoreUtils qw( part );
use Benchmark ':all';

package Person {
	use Moo;
	has name  =&gt; (is =&gt; 'ro');
	has title =&gt; (is =&gt; 'ro');
}

my @employees = map {
	Person::-&gt;new(
		name  =&gt; int(rand(999999)),
		title =&gt; int(rand(2))?'Staff':'Manager',
	)
} 0..200;

sub obvious {
	my @sorted = sort {
		if ($a-&gt;title =~ /Manager/ and not $b-&gt;title =~ /Manager/) {
			-1;
		}
		elsif ($b-&gt;title =~ /Manager/ and not $a-&gt;title =~ /Manager/) {
			1;
		}
		else {
			$a-&gt;name cmp $b-&gt;name;
		}
	} @employees;
}

sub subtle {
	my @sorted = sort {
		($a-&gt;title !~ /Manager/) - ($b-&gt;title !~ /Manager/)
		or $a-&gt;name cmp $b-&gt;name;
	} @employees;
}

sub functional {	
	my @sorted = 
		map  { sort { $a-&gt;name cmp $b-&gt;name } @$_ }
		part { $_-&gt;title !~ /Manager/ }
		@employees;
}

cmpthese(1000, {
	obvious    =&gt; \&amp;obvious,
	subtle     =&gt; \&amp;subtle,
	functional =&gt; \&amp;functional,
});
&lt;/code&gt;
&lt;/readmore&gt;

&lt;p&gt;The results were impresssive...&lt;/p&gt;
&lt;pre&gt;
            Rate    obvious     subtle functional
obvious    102/s         --       -26%       -70%
subtle     137/s        34%         --       -59%
functional 336/s       228%       144%         --
&lt;/pre&gt;

&lt;p&gt;That's right; the functional approach shaves off 70% of the time taken to sort the list of employees!&lt;/p&gt;

&lt;p&gt;So remember, for prioritised sorting, &lt;c&gt;part&lt;/c&gt; is your friend!&lt;/p&gt;

&lt;p&gt;&lt;b&gt;UPDATE:&lt;/b&gt;, using [mod://Sort::Key]'s &lt;c&gt;keysort&lt;/c&gt; in conjunction with the functional approach doubles its speed once again!&lt;/p&gt;

&lt;code&gt;
my @sorted = 
	map  { keysort { $_-&gt;name } @$_ }
	part { $_-&gt;title !~ /Manager/ }
	@employees;
&lt;/code&gt;

&lt;p&gt;&lt;c&gt;keysort&lt;/c&gt; is an XS implementation of the Schwartzian transform.&lt;/p&gt;

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-757127"&gt;
&lt;small&gt;&lt;small&gt;
&lt;tt&gt;perl -E'sub Monkey::do{say$_,for@_,do{($monkey=&amp;#x5B;caller(0)]-&gt;&amp;#x5B;3])=~s{::}{ }and$monkey}}"Monkey say"-&gt;Monkey::do'
&lt;/tt&gt;&lt;/small&gt;&lt;/small&gt;
&lt;/div&gt;&lt;/div&gt;</field>
</data>
</node>
