Array Sort

by gopalr (Priest)
 on May 10, 2011 at 02:10 UTC Need Help??
gopalr has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need your help to sort the array content.

The array has following value and the each value has delimitted by '-'.

```@array=(
'AC-BA91-CA',
'AB-BA92-CA',
'AA-BA93-CA',
'AA-BA89-CB'
)

I need to sort the array by middle of the value. For example, the array output should be the following

```@array=(
'AA-BA89-CB'
'AC-BA91-CA',
'AB-BA92-CA',
'AA-BA93-CA'
)

I tried to sort by using the cmp and <=>, but I need to sort middle of the value.

Thanks!!!

Replies are listed 'Best First'.
Re: Array Sort
by BrowserUk (Pope) on May 10, 2011 at 02:17 UTC

```print for sort{
substr( \$a, 3, 4 ) cmp substr( \$b, 3, 4 )
} 'AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89-CB'
+;;
AA-BA89-CB
AC-BA91-CA
AB-BA92-CA
AA-BA93-CA
[0] Perl>

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Array Sort
by GrandFather (Sage) on May 10, 2011 at 04:22 UTC

For small amounts of data and with a simple way of extracting the key string the technique BrowerUK suggests is a good solution. If finding the key is expensive or the amount of data being sorted is large such that the sort processing takes a long time a common technique is the schwartzian transform. Consider:

```#!/usr/local/bin/perl
use strict;
use warnings;

my @data =
('AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89-
+CB');

my @sorted =
map {\$_->[0]} sort {\$a->[2] cmp \$b->[2]} map {[\$_, split '-']} @da
+ta;

print "\$_\n" for @sorted;

Prints:

```AA-BA89-CB
AC-BA91-CA
AB-BA92-CA
AA-BA93-CA

The key to the process is to transform the data into a form that can be sorted, sort it, then transform it back again. The process really runs from right to left with the right most map generating a temporary transformed list (with the original data as the first element of each list item. The sort is in the middle. Then the left most map pulls the original string out of each item in the sorted list.

True laziness is hard work
Not disagreeing with the strategy, however if there is a large quantity of data, shouldn't you restrict the sort element of the array to contain only the element to be sorted, as far as I can see it adds no overhead (we already have a list from the split function) and reduces the memory usage. ie
```#!/usr/local/bin/perl
use strict;
use warnings;

my @data =
('AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89-
+CB');

my @sorted =
map {\$_->[0]} sort {\$a->[1] cmp \$b->[1]} map {[ \$_ , (split '-')[1
+] ]} @data;

print "\$_\n" for @sorted;
Just wondering, Utilitarian.

print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

Sure. Storing just the original string and sort key is a reasonable idea if the data is very large. The little bit of extra "clutter" added was better left out of the original example though, especially as the OP may actually be working with quite different data and have different requirements for extracting the sort key than was suggested in the OP.

True laziness is hard work

It is worth noting that for the complication of the ST to save the OP any time over the standard sort, he would have to be sorting at least a 1 million elements.

And if he is sorting that much data, then he'll need every gain he can get, in which case the GRT suggested by javafan is far more effective. And it is simpler than the ST.

All in all, I wonder if there is ever a situation where the ST makes sense?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

All in all, I wonder if there is ever a situation where the ST makes sense?

Over plain sort, sure. Over GRT? Not so much.

ST can return objects, whereas GRT returns strings unless an external array is used.

ST can produce simpler and more readable code than GRT in some circumstances.

However, ST isn't as fast as GRT.

```# ST
my @fhs =
map \$_->[0],
sort { \$a->[1] <=> \$b->[1] }
map [ \$_, fileno(\$_) ],
get_fhs();
```# GRT
my @fhs = get_fhs();
@fhs =
map \$fhs[ unpack('x4 N', \$_) ]
sort
map pack('NN', fileno(\$fhs[\$_]), \$_),
0..\$#fhs;
```# GRT
my @fhs = get_fhs();
@fhs = @fhs[
map unpack('x4 N', \$_),
sort
map pack('NN', fileno(\$fhs[\$_]), \$_),
0..\$#fhs
];
I think you might find this application of ST will actually slow down the sort. As you said, it helps if the key is expensive to computer.

You also said ST is more likely to help for long lists, but that's not true. It loses effectiveness as the list grows.

Re: Array Sort
by JavaFan (Canon) on May 10, 2011 at 06:30 UTC
A GRT:
```@array = map {substr \$_, 4} sort map {substr(\$_, 3, 4) . \$_} @array
This assumes the middle is 4 characters long, and always starts after the 3rd character.

Interesting to note that you need to be sorting thousands of items before the GRT starts to show benefit over a plain sort.

It's better than the ST though, which looks to require millions before it gains anything.

```#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];

my @prefixes = 'AA' .. 'ZZ';
my @suffixes = '00' .. '99';

our \$N //= 10;
our @data = map{
'AA-'
. \$prefixes[ rand @prefixes ]
. \$suffixes[ rand @suffixes ]
. '-XX'
} 1 .. \$N;

cmpthese -3, {
buk => q[
my @sorted = sort{
substr( \$a, 3, 4, ) cmp substr( \$b, 3, 4  )
} @data
],
GF  => q[
my @sorted = map {
\$_->[0]
} sort {
\$a->[2] cmp \$b->[2]
} map {
[\$_, split '-']
} @data
],
jfn => q[
my @sorted = map {
substr \$_, 4
} sort map {
substr(\$_, 3, 4) . \$_
} @data
],
};

__END__
c:\test>903885
Rate   GF  jfn  buk
GF  18332/s   -- -61% -79%
jfn 46452/s 153%   -- -47%
buk 88418/s 382%  90%   --

c:\test>903885 -N=100
Rate   GF  jfn  buk
GF  1584/s   -- -64% -67%
jfn 4394/s 177%   --  -8%
buk 4757/s 200%   8%   --

c:\test>903885 -N=1000
Rate   GF  buk  jfn
GF  118/s   -- -64% -69%
buk 333/s 181%   -- -14%
jfn 387/s 227%  16%   --

c:\test>903885 -N=10000
Rate   GF  buk  jfn
GF  8.86/s   -- -61% -76%
buk 22.8/s 157%   -- -38%
jfn 36.7/s 314%  61%   --

c:\test>903885 -N=100000
(warning: too few ite

Rate   GF  buk  jfn
GF  0.707/s   -- -60% -75%
buk  1.75/s 147%   -- -38%
jfn  2.80/s 296%  60%   --

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Array Sort
by kejohm (Hermit) on May 10, 2011 at 02:35 UTC
Re: Array Sort
by SimonClinch (Deacon) on May 10, 2011 at 10:12 UTC
An alternative optimising solution to the Schwartzian transform suggested by grandfather is the "Orcish Manoeuvre", which also ensures that the operation evaluating the sort key is only applied once per entry, while using the same hash overhead, but often slightly less processing:
```my @sorted;
{
local %_ = ();
@sorted =
sort { \$_{ \$a } ||= substr( \$a, 3, 4 )
cmp
\$_{ \$b } ||= substr( \$b, 3, 4 )
} @array;
}
BUT, in most cases, I tend to simply pass through the data once creating the inverse hash to that:
```my %hash = map { substr( \$_, 3, 4 ), \$_ } @array;
Such a hash, using the key info as a hash key, tends to be more useful than a sorted array of keys in the subsequent processing.

Note: if the middle field should vary in length, then the expression ( split /\-/ )[1] should replace the substr function.

One world, one people

Re: Array Sort
by bart (Canon) on May 10, 2011 at 12:19 UTC
You can pick any of the advanced sorting techiques as described in Uri Guttman + Larry Rosler's paper: A Fresh Look at Efficient Perl Sorting. Just for practice.

If you don't really care about efficiency, or just to get a feel as what's going on, you can call a function for each items every single time.

```# for example, this'll do what you want
sub extract_middle { return shift() =~ /-(.*)-/ ? \$1 : '' }

@sorted = sort { extract_middle(\$a) cmp extract_middle(\$b) } @unsorted
+;
Note that the speedups as described in the paper are achieved by essentially calling this extract function only once for each item.

update Fixed a bug in the sub, thanks moritz

Re: Array Sort
by Khen1950fx (Canon) on May 10, 2011 at 14:26 UTC
Sort::Array can do it.
```#!/usr/bin/perl

use strict;
use warnings;
use Sort::Array qw(Sort_Table);
use Data::Dumper::Concise;

my @data = qw(
AC-BA91-CA
AB-BA92-CA
AA-BA93-CA
AA-BA89-CB
);

warn Dumper
@data = Sort_Table(
cols      => '3',
field     => '2',
sorting   => 'ascending',
structure => 'csv',
separator => '\-',
data      => \@data,
);

Create A New User
Node Status?
node history
Node Type: perlquestion [id://903885]
Approved by lidden
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2017-11-22 02:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
In order to be able to say "I know Perl", you must have:

Results (314 votes). Check out past polls.

Notices?