estreb has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks. Suppose I have an array of a random number of unsorted strings. I want to grab all the strings inside the array that contain an “r” and sort those in ascending alphabetical order. For the elements that are left, find the ones that start with an “e” and sort those by ascii value of their third character. And for the ones that are left reverse sort by their last letter alphabetically. So basically I’m applying three different sorting methods to different items in the array. What is a good way to do this in Perl? How do I “pick and choose” which items in the array I will sort, and make sure I leave the other items alone?
Re: How to perform different sorts on multiple sections of items in the same array
by Athanasius (Archbishop) on Dec 29, 2014 at 03:11 UTC
|
Hello estreb,
This sounds like homework, so I’ll give advice rather than code. First, you need to nail down the requirements. What sort of output is wanted? If it’s just three separate lists, sorted according to the criteria, I would proceed as follows:
- Partition the array into two: (i) strings containing the letter “r” and (ii) all others.
- Partition array (ii) into two further arrays: (iii) strings starting with the letter “e” and (iv) all others.
- Sort array (i) alphabetically, ascending.
- Sort array (iii) using a sort routine that compares third characters alphabetically, ascending.
- Sort array (iv) using a sort routine that compares final characters alphabetically, descending.
- Output arrays (i), (iii), and (iv).
For (1) and (2), look at using grep or the part function from List::MoreUtils. For (3), just use sort. For (4) and (5), use sort together with a custom sort routine employing substr.
Now, there are parts of the specification that make me think the desired output may be more complicated than three separate sorted lists:
How do I “pick and choose” which items in the array I will sort, and make sure I leave the other items alone?
If you really do need to output a single array, with the original elements sorted “in place” and all interleaved back together (whatever that would really mean), a more complex approach will be needed. But any thought along those lines will be wasted effort until the specification has been clarified. As always, the best way to clarify a specification is to provide sample input together with the corresponding desired output (and this is just as important when talking to lecturers as when presenting problems to PerlMonks).
Hope that helps,
Update: fixed typo.
| [reply] [d/l] [select] |
Re: How to perform different sorts on multiple sections of items in the same array
by Anonymous Monk on Dec 29, 2014 at 03:24 UTC
|
#!/usr/bin/env perl
use warnings;
use strict;
use List::Util 'shuffle';
use Test::More tests=>1;
my @expect = qw/
frooc froof froz
ebaz eyun ebxu
quz qun que /;
my @input = shuffle @expect;
sub special { # sort routine used below
$$a[1] <=> $$b[1] # first sort by categories
or # then sort by the desired condition per category
$$a[1]==1 ? $$a[0] cmp $$b[0] # category 1 (/r/)
: ( $$a[1]==2 ? $$a[2] cmp $$b[2] # category 2 (/^e/)
: $$b[3] cmp $$a[3] ) # category 3 (rest)
}
my @output = # sort with Schwartzian transform
# Step 3: get back the orig_str out of the array
map { $$_[0] }
# Step 2: sort with the "special" function defined above
sort special
# Step 1: each item -> [orig_str, category, third_chr, last_chr]
# where category is: /r/ => 1, /^e/ => 2, rest => 3
map { [$_, /r/?1:(/^e/?2:3), substr($_,2,1), substr($_,-1,1)] }
@input;
is_deeply \@output, \@expect
or diag explain \@output;
| [reply] [d/l] |
Re: How to perform different sorts on multiple sections of items in the same array
by BrowserUk (Patriarch) on Dec 29, 2014 at 13:36 UTC
|
How do I “pick and choose” which items in the array I will sort, and make sure I leave the other items alone?
The trick here is to extract a list of the indices of the required subsets; sort those indices according to the appropriate criteria; and then build the sorted array by slicing the sorted array by the subset indices in their original order and the unsorted array by the subset indices in their sorted order.
To illustrate:
data: iOrig iSort
0: e20 0 4
1: b17
2: e04 2 2
3: r16
4: e01 4 0
5: b12
6: e99 6 6
@sorted[ @iOrig ] = @data[ @iSort ]
Thus, the subset from the original data is assigned in their sorted order to the sorted dataset overlaying their original positions; but reordered.
A simple example coded for clarity with lots of scope for optimisation: #! perl -slw
use strict;
my @data = qw[
alpha bravo charlie delta east echo exit foxtrot golf hotel india
+juliet kilo lima mike
november oscar papa quebec romeo sierra tango uniform victor whisk
+ey xray yankee zulu
];
my @sortedData;
my @rindex = grep{ $data[ $_ ] =~ m[r] } 0 .. $#data;
my @rsorted = sort{ $data[ $a ] cmp $data[ $b ] } @rindex;
@sortedData[ @rindex ] = @data[ @rsorted ];
my @eindex = grep{ $data[ $_ ] !~ m[r] && $data[ $_ ] =~ m[^e] } 0 ..
+$#data;
my @esorted = sort{ substr( $data[ $a ], 2, 1 ) cmp substr( $data[ $b
+], 2, 1 ) } @eindex;
@sortedData[ @eindex ] = @data[ @esorted ];
my @oindex = grep{ $data[ $_ ] !~ m[r] && $data[ $_ ] !~ m[^e] } 0 ..
+$#data;
my @osorted = sort{ substr( $data[ $b ], -1 ) cmp substr( $data[ $a ],
+ -1 ) } @oindex;
@sortedData[ @oindex ] = @data[ @osorted ];
print for @sortedData;
__END__
C:\test>1111596.pl
whiskey
bravo
charlie
zulu
echo
exit
east
foxtrot
juliet
kilo
tango
hotel
golf
mike
yankee
november
oscar
quebec
alpha
romeo
sierra
delta
uniform
victor
india
xray
lima
papa
Notes:
- bravo and charlie haven't moved because they both contain 'r' and were in the required order to start with.
- east, echo & exit have remain in the same range of positions 4..6, but are reordered according to their 3rd characters 'h', 'i' & 's' respectively.
- whiskey is first because it: doesn't start with 'e', nor contain 'r', and ends in 'y'.
- papa is last because it: doesn't start with 'e', nor contain 'r', and ends in 'a'.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
#! perl -slw
use strict;
my @data = qw[
alpha bravo charlie delta east echo exit foxtrot golf hotel india
+juliet kilo lima mike
november oscar papa quebec romeo sierra tango uniform victor whisk
+ey xray yankee zulu
];
my @sortedData;
my( @rindex, @eindex, @oindex );
push @{ $data[ $_ ] =~ m[r] ? \@rindex : $data[ $_ ] =~ m[^e] ? \@eind
+ex : \@oindex }, $_ for 0 .. $#data;
my @rsorted = sort{ $data[ $a ] cmp $data[ $b
+] } @rindex;
my @esorted = sort{ substr( $data[ $a ], 2, 1 ) cmp substr( $data[ $b
+], 2, 1 ) } @eindex;
my @osorted = sort{ substr( $data[ $b ], -1 ) cmp substr( $data[ $a
+], -1 ) } @oindex;
@sortedData[ @rindex, @eindex, @oindex ] = @data[ @rsorted, @esorted,
+@osorted ];
print for @sortedData;
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
Whaaaaaat. This can't work under strict. How is this magic of taking an array like \@rindex before it is defined. Is it being defined in the push statement? I don't understand how that works. Under perltidy it kind of makes more sense, it looks like the ternary operator is used as a branching statement. But what array is it pushing to?.. ohh okay it is pushing to either rindex, eindex, or oindex depending on what match is found.. if that is what is going on, that is really sweet. Is that what is going on here? I printed this out and I'm just staring at it.
is it referencing and dereferencing in the same line.. why is that necessary?
| [reply] |
|
|
Re: How to perform different sorts on multiple sections of items in the same array
by Laurent_R (Canon) on Dec 29, 2014 at 10:42 UTC
|
I agree with Athanasius that it is probably premature to provide sample code when the specification is so vague.
| [reply] |
Re: How to perform different sorts on multiple sections of items in the same array
by james28909 (Deacon) on Dec 29, 2014 at 03:15 UTC
|
Maybe this is what you mean?
use strict;
use warnings;
my @array = qw(toad frog frig frag crow creep hop);
my @sorted;
for (@array) {
if ( $_ =~ m/.*r/g ) {
push( @sorted, $_ );
}
}
print "$_\n" for sort(@sorted);
Output:
creep
crow
frag
frig
frog
EDIT: Missed the point of this might being a homework assignment. apologies | [reply] [d/l] [select] |
|
my @selected = grep {/r/} @array;
Note that neither version of the matching code actually performs a sort so the variable name is more appropriately 'selected' than 'sorted'.
Perl is the programming world's equivalent of English
| [reply] [d/l] |
Re: How to perform different sorts on multiple sections of items in the same array
by Anonymous Monk on Dec 29, 2014 at 07:14 UTC
|
That's a fun homework. Here's a stable sort (groups maintain their positions relative to each other). That is, given the input
xxxxxC
ABrrrr
e_3_ee
xxxxxA
AArrrr
e_2_ee
xxxxxB
ACrrrr
e_1_ee
the output is
xxxxxC
AArrrr
e_1_ee
xxxxxB
ABrrrr
e_2_ee
xxxxxA
ACrrrr
e_3_ee
Another variation of a Schwartzian transform. sort doesn't sort arrays in place (doesn't sort arrays at all, only lists), so we sort each group separately and use slice assignement to restore order.
| [reply] [d/l] [select] |
Re: How to perform different sorts on multiple sections of items in the same array
by johngg (Canon) on Dec 30, 2014 at 01:03 UTC
|
Rather than complicated sorting routines, perhaps it would be preferable to set up a couple of "flags" depending on what type the string is and, consequently, what character to sort on then pack them along with the original string to do a GRT sort. This will not be stable for the last two categories if the strings share the same sorting character, at which point they will be sorted in ascending alpha.
use strict;
use warnings;
use 5.014;
my @strings = qw{
cabin
prawn
pail
error
reptile
engine
copy
echo
soap
eskimo
carpet
emblem
skunk
};
say for
map { substr $_, 8 }
sort
map {
my( $type, $ord );
if ( m{r} )
{
$type = 0;
$ord = 0;
}
elsif ( m{^e} )
{
$type = 1;
$ord = ord substr $_, 2, 1;
}
else
{
$type = 2;
$ord = 255 - ord substr $_, -1, 1;
}
pack q{NNa*}, $type, $ord, $_;
}
@strings;
The output.
carpet
error
prawn
reptile
emblem
engine
echo
eskimo
copy
soap
cabin
pail
skunk
I hope this is of interest.
Update: Corrected statement re. stable sorting, s/first/last/
| [reply] [d/l] [select] |
|
copy carpet soap error prawn emblem cabin echo pail engine reptile esk
+imo skunk
Not: carpet error prawn reptile emblem engine echo eskimo copy soap cabin p
+ail skunk
Consider the OPs 3-part specification and the following: orig pre-r sort-r pre-e sort-e pre-o sort-O
+ final
cabin cabin copy
+ copy
prawn prawn carpet
+ carpet
pail pail soap
+ soap
error error error
+ error
reptile reptile prawn
+ prawn
engine engine emblem
+ emblem
copy copy cabin
+ cabin
echo echo echo
+ echo
soap soap pail
+ pail
eskimo eskimo engine
+ engine
carpet carpet reptile
+ reptile
emblem emblem eskimo
+ eskimo
skunk skunk skunk
+ skunk
The problem with your algorithm is, that is doesn't exclude those items that match the first criteria, from being considered by the second criteria, if the also match it; but the OPs specifications (emphasis added):
all the strings inside the array that contain an “r” and sort those in ascending alphabetical order. For the elements that are left, find the ones that start with an “e” and sort those by ascii value of their third character. And for the ones that are left reverse sort by their last letter alphabetically.
Your algorithm does not exclude items that contain an 'r' from being considered when processing items that start with an 'e'. Eg. 'error'.
Nor does it exclude items that contain an 'r', or start with an 'e', from being processed when reverse sorting "the ones that are left".
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
Perhaps I misinterpreted the OP's spec. I thought the requirement was to sort all those containing an 'r' followed by those starting with an 'e' and finally the remainder but I can see that your interpretation of leaving each set in their relative positions could be more accurate. Perhaps estreb could clarify?
Your algorithm does not exclude items that contain an 'r' from being considered when processing items that start with an 'e'. Eg. 'error'.
Nor does it exclude items that contain an 'r', or start with an 'e', from being processed when reverse sorting "the ones that are left".
I'm not sure you are correct in those statements. The if ( m{r} ) { ... } elsif ( m{^e} ) { ... } else { ... } section assigns ascending values to $type which is packed as the first term and becomes the primary key for the sort. Thus all those starting with an 'r' are sorted separately from those starting with 'e' which are again separate from the remainder. I think this is demonstrated by the output.
carpet }
error } Strings containing an 'r' anywhere
prawn } are sorted ascending alpha
reptile }
emblem }
engine } Strings starting with 'e' but not
echo } containing an 'r' are sorted ascending
eskimo } alpha on the third character
copy }
soap } Strings neither containing an 'r' nor
cabin } starting with 'e' are sorted descending
pail } alpha on the last character
skunk }
Please let me know if you still think I am having a senior moment and have got this aspect of the algorithm completely wrong.
Update: Corrected wrong variable in explanation, s/$ord/$type/
| [reply] [d/l] [select] |
|
|
use strict;
use warnings;
use 5.014;
my @strings = qw{
cabin
prawn
eminent
pail
error
reptile
engine
copy
echo
soap
eskimo
carpet
elias
emblem
skunk
cheap
};
say for
map { substr $_, 8 }
sort {
unpack( q{N}, $a ) || unpack( q{N}, $b )
? substr( $a, 0, 8 ) cmp substr( $b, 0, 8 )
: $a cmp $b;
}
map {
my( $type, $ord );
if ( m{r} )
{
$type = 0;
$ord = 0;
}
elsif ( m{^e} )
{
$type = 1;
$ord = ord substr $_, 2, 1;
}
else
{
$type = 2;
$ord = 255 - ord substr $_, -1, 1;
}
pack q{NNa*}, $type, $ord, $_;
}
@strings;
The output, note that 'eminent' and 'elias' have the same 3rd character but now retain their relative order, as do 'soap' and 'cheap' with matching final characters in the last category.
carpet
error
prawn
reptile
emblem
engine
echo
eminent
elias
eskimo
copy
soap
cheap
cabin
pail
skunk
I hope this is of interest.
| [reply] [d/l] [select] |
Re: How to perform different sorts on multiple sections of items in the same array
by sundialsvc4 (Abbot) on Dec 29, 2014 at 12:41 UTC
|
Very-elaborate sorting requirements such as these can be satisfied by creating an appropriate function for use with the sort verb. As you know, this function must return a value that is either less-than, equal-to, or greater-than zero, based on its comparison of two strings ($a and $b) that are presented to it. Typically, this is an “in-line” function such as { $a <=> $b } (using an operator that is expressly designed for this purpose), but it does not have to be. You can write an entirely separate sub and refer to it within the sort verb, and this is what I suggest that you do.
Let me describe my notion just in words. This is what the sub will do, each time it is called.
- First, classify the two strings into “groups.”
- Group #1 consists of those which contain an “r.”
- Group #2 consists of those characters which start with “e.”
- Group #3 consists of those strings which do not fall into either of the other two groups.
-
If the group-number of the two strings is unequal, return the difference between the two group-numbers. (Only the sign of the result matters.) This will cause the groups to be separated: the groups will occur in the final output in order of group-number, and, within their respective groups, they will be sorted as (see below) each group requires.
-
Otherwise (the group-number is the same ...), return a value that is appropriate for each group.
- Group #1 can simply use <=>.
- Group #2 can (probably) remove the first characters, then use that operator, or use it on a substring. (You probably need to consider more than just the one character. Otherwise, multiple strings with the same fourth-character would have an unpredictable result-order, since differing strings would all be seen as “equal.”)
- The third, to achieve a reverse-sort, would apply the <=> operator with the two operands reversed.
Notice that you are not splitting the array and sorting each piece. You are not using memory-hungry exotica like Schwartz. This notion can be applied to sorts of any size, and it applies equally well to both in-memory and disk-based sorts.
Q.E.D.
| [reply] |
|
| [reply] [d/l] |
|
| [reply] |
|
You probably need to consider more than just the one character. Otherwise, multiple strings with the same fourth-character would have an unpredictable result-order, since differing strings would all be seen as “equal.”
You have to upgrade your Perl!
sort
In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm
| [reply] |
|
You are not using memory-hungry exotica like Schwartz.
It is precisely when the comparison routine is complicated and likely to be time-expensive that the Schwartzian transform gives the best performance improvement.
| [reply] |
|
The statement about <=> vs. the string version, cmp, is of course my blunder. Oops.
The idea itself is certainly not new: that’s how they did it (and, still do it ...) in the earliest COBOL days. But it most certainly isn’t “plagarism,” as if such a curious term could actually apply in this context ... which it obviously can’t. (In general, several of you routinely step over the line of decorum and civility . . . and I’ve simply given up asking you to stop. But, I digress.)
A lot depends on exactly how much data we might be dealing with. If the whole shebang fits easily within available memory, then it matters very little how you do it: split the in-memory list into three in-memory lists, sort each one, splice ’em all together into one, and you’re done. Or, pay your nod to Mr. Schwartz, if you prefer. No discussion required. But, if data volumes are large ... especially if they are very large ... custom sort-comparison routines work extremely well.
I decided to make this suggestion ... not to hear myself babble, nor with the slightest concern for “votes” ... but to offer an alternative strategy. “There’s more than one way to do it,™” and here’s another. Yes, it involves a non-trivial subroutine. But, so do many sorts. Even if it is deemed not-useful in this particular (small?) case, it is a useful alternative to keep in mind.
| [reply] |
|
|