How to perform different sorts on multiple sections of items in the same array

Re: How to perform different sorts on multiple sections of items in the same array
by Anonymous Monk on Dec 29, 2014 at 03:24 UTC

sort and How do I sort an array by (anything)?

Here's one way to do it with fairly complex sort criteria and a Schwartzian transform (assuming I've understood the requirements correctly):

#!/usr/bin/env perl
use warnings;
use strict;

use List::Util 'shuffle';
use Test::More tests=>1;

my @expect = qw/
    frooc froof froz
    ebaz eyun ebxu
    quz qun que /;

my @input = shuffle @expect;

sub special {  # sort routine used below
    $$a[1] <=> $$b[1]  # first sort by categories
    or  # then sort by the desired condition per category
    $$a[1]==1 ? $$a[0] cmp $$b[0]      # category 1 (/r/)
    : ( $$a[1]==2 ? $$a[2] cmp $$b[2]  # category 2 (/^e/)
      : $$b[3] cmp $$a[3] )            # category 3 (rest)
}
my @output = # sort with Schwartzian transform
    # Step 3: get back the orig_str out of the array
    map { $$_[0] }
    # Step 2: sort with the "special" function defined above
    sort special
    # Step 1: each item -> [orig_str, category, third_chr, last_chr]
    # where category is: /r/ => 1, /^e/ => 2, rest => 3
    map { [$_, /r/?1:(/^e/?2:3), substr($_,2,1), substr($_,-1,1)] }
      @input;

is_deeply \@output, \@expect
    or diag explain \@output;
[download]

Re: How to perform different sorts on multiple sections of items in the same array
by BrowserUk (Patriarch) on Dec 29, 2014 at 13:36 UTC

How do I “pick and choose” which items in the array I will sort, and make sure I leave the other items alone?

The trick here is to extract a list of the indices of the required subsets; sort those indices according to the appropriate criteria; and then build the sorted array by slicing the sorted array by the subset indices in their original order and the unsorted array by the subset indices in their sorted order.

To illustrate:

data:     iOrig     iSort
0: e20        0            4
1: b17
2: e04        2            2
3: r16
4: e01        4            0
5: b12
6: e99        6            6

@sorted[ @iOrig ] = @data[ @iSort ]
[download]

Thus, the subset from the original data is assigned in their sorted order to the sorted dataset overlaying their original positions; but reordered.

A simple example coded for clarity with lots of scope for optimisation:

#! perl -slw
use strict;

my @data = qw[
    alpha bravo charlie delta east echo exit foxtrot golf hotel india 
+juliet kilo lima mike
    november oscar papa quebec romeo sierra tango uniform victor whisk
+ey xray yankee zulu
];
my @sortedData;

my @rindex = grep{ $data[ $_ ] =~ m[r] } 0 .. $#data;
my @rsorted = sort{ $data[ $a ] cmp $data[ $b ] } @rindex;
@sortedData[ @rindex ] = @data[ @rsorted ];

my @eindex = grep{ $data[ $_ ] !~ m[r] && $data[ $_ ] =~ m[^e] } 0 .. 
+$#data;
my @esorted = sort{ substr( $data[ $a ], 2, 1 ) cmp substr( $data[ $b 
+], 2, 1 ) } @eindex;
@sortedData[ @eindex ] = @data[ @esorted ];

my @oindex = grep{ $data[ $_ ] !~ m[r] && $data[ $_ ] !~ m[^e] } 0 .. 
+$#data;
my @osorted = sort{ substr( $data[ $b ], -1 ) cmp substr( $data[ $a ],
+ -1 ) } @oindex;
@sortedData[ @oindex ] = @data[ @osorted ];

print for @sortedData;


__END__
C:\test>1111596.pl
whiskey
bravo
charlie
zulu
echo
exit
east
foxtrot
juliet
kilo
tango
hotel
golf
mike
yankee
november
oscar
quebec
alpha
romeo
sierra
delta
uniform
victor
india
xray
lima
papa
[download]

Notes:

bravo and charlie haven't moved because they both contain 'r' and were in the required order to start with.
east, echo & exit have remain in the same range of positions 4..6, but are reordered according to their 3rd characters 'h', 'i' & 's' respectively.
whiskey is first because it: doesn't start with 'e', nor contain 'r', and ends in 'y'.
papa is last because it: doesn't start with 'e', nor contain 'r', and ends in 'a'.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Re^2: How to perform different sorts on multiple sections of items in the same array (optimised)

by BrowserUk (Patriarch) on Dec 29, 2014 at 15:15 UTC

Updated: added missing declaration per trippledubs's report below.

A somewhat more optimal version that avoids multiple greps and re-testing of some conditions:

#! perl -slw
use strict;

my @data = qw[
    alpha bravo charlie delta east echo exit foxtrot golf hotel india 
+juliet kilo lima mike
    november oscar papa quebec romeo sierra tango uniform victor whisk
+ey xray yankee zulu
];
my @sortedData;
my( @rindex, @eindex, @oindex );

push @{ $data[ $_ ] =~ m[r] ? \@rindex : $data[ $_ ] =~ m[^e] ? \@eind
+ex : \@oindex }, $_ for 0 .. $#data;

my @rsorted = sort{         $data[ $a ]         cmp         $data[ $b 
+]         } @rindex;
my @esorted = sort{ substr( $data[ $a ], 2, 1 ) cmp substr( $data[ $b 
+], 2, 1 ) } @eindex;
my @osorted = sort{ substr( $data[ $b ], -1 )   cmp substr( $data[ $a 
+], -1 )   } @oindex;

@sortedData[ @rindex, @eindex, @oindex ] = @data[ @rsorted, @esorted, 
+@osorted ];

print for @sortedData;
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Re^3: How to perform different sorts on multiple sections of items in the same array (optimised)

by trippledubs (Deacon) on Dec 30, 2014 at 05:25 UTC

Whaaaaaat. This can't work under strict. How is this magic of taking an array like \@rindex before it is defined. Is it being defined in the push statement? I don't understand how that works. Under perltidy it kind of makes more sense, it looks like the ternary operator is used as a branching statement. But what array is it pushing to?.. ohh okay it is pushing to either rindex, eindex, or oindex depending on what match is found.. if that is what is going on, that is really sweet. Is that what is going on here? I printed this out and I'm just staring at it.

is it referencing and dereferencing in the same line.. why is that necessary?

Re^4: How to perform different sorts on multiple sections of items in the same array (optimised)

by Athanasius (Archbishop) on Dec 30, 2014 at 10:11 UTC

Re^4: How to perform different sorts on multiple sections of items in the same array (optimised)

by BrowserUk (Patriarch) on Dec 30, 2014 at 10:20 UTC

Re: How to perform different sorts on multiple sections of items in the same array
by Laurent_R (Canon) on Dec 29, 2014 at 10:42 UTC

Athanasius

Re: How to perform different sorts on multiple sections of items in the same array
by james28909 (Deacon) on Dec 29, 2014 at 03:15 UTC

use strict;
use warnings;

my @array = qw(toad frog frig frag crow creep hop);
my @sorted;

for (@array) {
    if ( $_ =~ m/.*r/g ) {
        push( @sorted, $_ );
    }
}
print "$_\n" for sort(@sorted);
[download]

creep
crow
frag
frig
frog
[download]

Re^2: How to perform different sorts on multiple sections of items in the same array

by GrandFather (Saint) on Dec 30, 2014 at 04:03 UTC

The .* is not required in your match expression nor is the /g. The whole loop can be replaced by:

my @selected = grep {/r/} @array;
[download]

Note that neither version of the matching code actually performs a sort so the variable name is more appropriately 'selected' than 'sorted'.

Perl is the programming world's equivalent of English

Re: How to perform different sorts on multiple sections of items in the same array
by Anonymous Monk on Dec 29, 2014 at 07:14 UTC

xxxxxC
ABrrrr
e_3_ee

xxxxxA
AArrrr
e_2_ee

xxxxxB
ACrrrr
e_1_ee
[download]

xxxxxC
AArrrr
e_1_ee

xxxxxB
ABrrrr
e_2_ee

xxxxxA
ACrrrr
e_3_ee
[download]

sort

Read more... (2 kB)

Re: How to perform different sorts on multiple sections of items in the same array
by johngg (Canon) on Dec 30, 2014 at 01:03 UTC

Rather than complicated sorting routines, perhaps it would be preferable to set up a couple of "flags" depending on what type the string is and, consequently, what character to sort on then pack them along with the original string to do a GRT sort. This will not be stable for the last two categories if the strings share the same sorting character, at which point they will be sorted in ascending alpha.

use strict;
use warnings;

use 5.014;

my @strings = qw{
   cabin
   prawn
   pail
   error
   reptile
   engine
   copy
   echo
   soap
   eskimo
   carpet
   emblem
   skunk
   };

say for
   map  { substr $_, 8 }
   sort
   map  {
        my( $type, $ord );
        if ( m{r} )
        {
            $type = 0;
            $ord  = 0;
        }
        elsif ( m{^e} )
        {
            $type = 1;
            $ord  = ord substr $_, 2, 1;
        }
        else
        {
            $type = 2;
            $ord  = 255 - ord substr $_, -1, 1;
        }
        pack q{NNa*}, $type, $ord, $_;
        }
   @strings;
[download]

The output.

carpet
error
prawn
reptile
emblem
engine
echo
eskimo
copy
soap
cabin
pail
skunk
[download]

I hope this is of interest.

Update: Corrected statement re. stable sorting, s/first/last/

Cheers,

JohnGG

Re^2: How to perform different sorts on multiple sections of items in the same array

by BrowserUk (Patriarch) on Dec 30, 2014 at 02:00 UTC

For your sample data, the OPs specifications, should lead to an ordering of:

copy carpet soap error prawn emblem cabin echo pail engine reptile esk
+imo skunk
[download]

Not:

carpet error prawn reptile emblem engine echo eskimo copy soap cabin p
+ail skunk
[download]

Consider the OPs 3-part specification and the following:

orig    pre-r   sort-r          pre-e   sort-e          pre-o   sort-O
+          final

cabin                                                   cabin   copy  
+          copy
prawn   prawn   carpet                                                
+          carpet
pail                                                    pail    soap  
+          soap
error   error   error                                                 
+          error
reptile reptile prawn                                                 
+          prawn
engine                          engine  emblem                        
+          emblem
copy                                                    copy   cabin  
+          cabin
echo                            echo    echo                          
+          echo
soap                                                    soap   pail   
+          pail
eskimo                          eskimo  engine                        
+          engine
carpet  carpet  reptile                                               
+          reptile
emblem                          emblem  eskimo                        
+          eskimo
skunk                                                   skunk  skunk  
+          skunk
[download]

The problem with your algorithm is, that is doesn't exclude those items that match the first criteria, from being considered by the second criteria, if the also match it; but the OPs specifications (emphasis added):

all the strings inside the array that contain an “r” and sort those in ascending alphabetical order. For the elements that are left, find the ones that start with an “e” and sort those by ascii value of their third character. And for the ones that are left reverse sort by their last letter alphabetically.

Your algorithm does not exclude items that contain an 'r' from being considered when processing items that start with an 'e'. Eg. 'error'.

Nor does it exclude items that contain an 'r', or start with an 'e', from being processed when reverse sorting "the ones that are left".

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Re^3: How to perform different sorts on multiple sections of items in the same array

by johngg (Canon) on Dec 30, 2014 at 12:28 UTC

Perhaps I misinterpreted the OP's spec. I thought the requirement was to sort all those containing an 'r' followed by those starting with an 'e' and finally the remainder but I can see that your interpretation of leaving each set in their relative positions could be more accurate. Perhaps estreb could clarify?

Your algorithm does not exclude items that contain an 'r' from being considered when processing items that start with an 'e'. Eg. 'error'.

Nor does it exclude items that contain an 'r', or start with an 'e', from being processed when reverse sorting "the ones that are left".

I'm not sure you are correct in those statements. The if ( m{r} ) { ... } elsif ( m{^e} ) { ... } else { ... } section assigns ascending values to $type which is packed as the first term and becomes the primary key for the sort. Thus all those starting with an 'r' are sorted separately from those starting with 'e' which are again separate from the remainder. I think this is demonstrated by the output.

carpet    }
error     }    Strings containing an 'r' anywhere
prawn     }    are sorted ascending alpha
reptile   }

emblem    }
engine    }    Strings starting with 'e' but not
echo      }    containing an 'r' are sorted ascending
eskimo    }    alpha on the third character

copy      }
soap      }    Strings neither containing an 'r' nor
cabin     }    starting with 'e' are sorted descending
pail      }    alpha on the last character
skunk     }
[download]

Please let me know if you still think I am having a senior moment and have got this aspect of the algorithm completely wrong.

Update: Corrected wrong variable in explanation, s/$ord/$type/

Cheers,

JohnGG

Re^4: How to perform different sorts on multiple sections of items in the same array

by BrowserUk (Patriarch) on Dec 30, 2014 at 12:40 UTC

Re^2: How to perform different sorts on multiple sections of items in the same array

by johngg (Canon) on Dec 31, 2014 at 00:40 UTC

To add stable sorting for the last two categories, restrict the sort to $type and $ord (packed as NN so taking 8 bytes) unless the $type is 0 for both terms, in which case sort the entire strings.

use strict;
use warnings;

use 5.014;

my @strings = qw{
   cabin
   prawn
   eminent
   pail
   error
   reptile
   engine
   copy
   echo
   soap
   eskimo
   carpet
   elias
   emblem
   skunk
   cheap
   };

say for
   map  { substr $_, 8 }
   sort {
          unpack( q{N}, $a ) || unpack( q{N}, $b )
          ? substr( $a, 0, 8 ) cmp substr( $b, 0, 8 )
          : $a                 cmp $b;
        }
   map  {
          my( $type, $ord );
          if ( m{r} )
          {
              $type = 0;
              $ord  = 0;
          }
          elsif ( m{^e} )
          {
              $type = 1;
              $ord  = ord substr $_, 2, 1;
          }
          else
          {
              $type = 2;
              $ord  = 255 - ord substr $_, -1, 1;
          }
          pack q{NNa*}, $type, $ord, $_;
        }
   @strings;
[download]

The output, note that 'eminent' and 'elias' have the same 3rd character but now retain their relative order, as do 'soap' and 'cheap' with matching final characters in the last category.

carpet
error
prawn
reptile
emblem
engine
echo
eminent
elias
eskimo
copy
soap
cheap
cabin
pail
skunk
[download]

I hope this is of interest.

Cheers,

JohnGG

Re: How to perform different sorts on multiple sections of items in the same array
by sundialsvc4 (Abbot) on Dec 29, 2014 at 12:41 UTC

Very-elaborate sorting requirements such as these can be satisfied by creating an appropriate function for use with the sort verb. As you know, this function must return a value that is either less-than, equal-to, or greater-than zero, based on its comparison of two strings ($a and $b) that are presented to it. Typically, this is an “in-line” function such as { $a <=> $b } (using an operator that is expressly designed for this purpose), but it does not have to be. You can write an entirely separate sub and refer to it within the sort verb, and this is what I suggest that you do.

Let me describe my notion just in words. This is what the sub will do, each time it is called.

First, classify the two strings into “groups.”
1. Group #1 consists of those which contain an “r.”
2. Group #2 consists of those characters which start with “e.”
3. Group #3 consists of those strings which do not fall into either of the other two groups.
If the group-number of the two strings is unequal, return the difference between the two group-numbers. (Only the sign of the result matters.) This will cause the groups to be separated: the groups will occur in the final output in order of group-number, and, within their respective groups, they will be sorted as (see below) each group requires.
Otherwise (the group-number is the same ...), return a value that is appropriate for each group.
1. Group #1 can simply use <=>.
2. Group #2 can (probably) remove the first characters, then use that operator, or use it on a substring. (You probably need to consider more than just the one character. Otherwise, multiple strings with the same fourth-character would have an unpredictable result-order, since differing strings would all be seen as “equal.”)
3. The third, to achieve a reverse-sort, would apply the <=> operator with the two operands reversed.

Notice that you are not splitting the array and sorting each piece. You are not using memory-hungry exotica like Schwartz. This notion can be applied to sorts of any size, and it applies equally well to both in-memory and disk-based sorts.

Q.E.D.

Re^2: How to perform different sorts on multiple sections of items in the same array

by Anonymous Monk on Dec 29, 2014 at 12:49 UTC

Group #1 can simply use <=>

If you're going to describe someone else's solution without giving credit, at least get it right. <=> does numeric comparison.

Re^2: How to perform different sorts on multiple sections of items in the same array

by karlgoethebier (Abbot) on Dec 29, 2014 at 13:04 UTC

sundialsvc4Plag?

«The Crux of the Biscuit is the Apostrophe»

Re^2: How to perform different sorts on multiple sections of items in the same array

by Anonymous Monk on Dec 29, 2014 at 19:01 UTC

You probably need to consider more than just the one character. Otherwise, multiple strings with the same fourth-character would have an unpredictable result-order, since differing strings would all be seen as “equal.”

sort

In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm

Re^2: How to perform different sorts on multiple sections of items in the same array

by Laurent_R (Canon) on Dec 29, 2014 at 19:52 UTC

You are not using memory-hungry exotica like Schwartz.

Re^3: How to perform different sorts on multiple sections of items in the same array

by sundialsvc4 (Abbot) on Dec 30, 2014 at 02:43 UTC

The statement about <=> vs. the string version, cmp, is of course my blunder. Oops.

The idea itself is certainly not new: that’s how they did it (and, still do it ...) in the earliest COBOL days. But it most certainly isn’t “plagarism,” as if such a curious term could actually apply in this context ... which it obviously can’t. (In general, several of you routinely step over the line of decorum and civility . . . and I’ve simply given up asking you to stop. But, I digress.)

A lot depends on exactly how much data we might be dealing with. If the whole shebang fits easily within available memory, then it matters very little how you do it: split the in-memory list into three in-memory lists, sort each one, splice ’em all together into one, and you’re done. Or, pay your nod to Mr. Schwartz, if you prefer. No discussion required. But, if data volumes are large ... especially if they are very large ... custom sort-comparison routines work extremely well.

I decided to make this suggestion ... not to hear myself babble, nor with the slightest concern for “votes” ... but to offer an alternative strategy. “There’s more than one way to do it,™” and here’s another. Yes, it involves a non-trivial subroutine. But, so do many sorts. Even if it is deemed not-useful in this particular (small?) case, it is a useful alternative to keep in mind.