Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

How to perform different sorts on multiple sections of items in the same array

by estreb (Novice)
on Dec 29, 2014 at 01:52 UTC ( [id://1111596]=perlquestion: print w/replies, xml ) Need Help??

estreb has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. Suppose I have an array of a random number of unsorted strings. I want to grab all the strings inside the array that contain an “r” and sort those in ascending alphabetical order. For the elements that are left, find the ones that start with an “e” and sort those by ascii value of their third character. And for the ones that are left reverse sort by their last letter alphabetically. So basically I’m applying three different sorting methods to different items in the array. What is a good way to do this in Perl? How do I “pick and choose” which items in the array I will sort, and make sure I leave the other items alone?
  • Comment on How to perform different sorts on multiple sections of items in the same array

Replies are listed 'Best First'.
Re: How to perform different sorts on multiple sections of items in the same array
by Athanasius (Archbishop) on Dec 29, 2014 at 03:11 UTC

    Hello estreb,

    This sounds like homework, so I’ll give advice rather than code. First, you need to nail down the requirements. What sort of output is wanted? If it’s just three separate lists, sorted according to the criteria, I would proceed as follows:

    1. Partition the array into two: (i) strings containing the letter “r” and (ii) all others.
    2. Partition array (ii) into two further arrays: (iii) strings starting with the letter “e” and (iv) all others.
    3. Sort array (i) alphabetically, ascending.
    4. Sort array (iii) using a sort routine that compares third characters alphabetically, ascending.
    5. Sort array (iv) using a sort routine that compares final characters alphabetically, descending.
    6. Output arrays (i), (iii), and (iv).

    For (1) and (2), look at using grep or the part function from List::MoreUtils. For (3), just use sort. For (4) and (5), use sort together with a custom sort routine employing substr.

    Now, there are parts of the specification that make me think the desired output may be more complicated than three separate sorted lists:

    How do I “pick and choose” which items in the array I will sort, and make sure I leave the other items alone?

    If you really do need to output a single array, with the original elements sorted “in place” and all interleaved back together (whatever that would really mean), a more complex approach will be needed. But any thought along those lines will be wasted effort until the specification has been clarified. As always, the best way to clarify a specification is to provide sample input together with the corresponding desired output (and this is just as important when talking to lecturers as when presenting problems to PerlMonks).

    Hope that helps,

    Update: fixed typo.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: How to perform different sorts on multiple sections of items in the same array
by Anonymous Monk on Dec 29, 2014 at 03:24 UTC

    sort and How do I sort an array by (anything)?

    Here's one way to do it with fairly complex sort criteria and a Schwartzian transform (assuming I've understood the requirements correctly):

    #!/usr/bin/env perl use warnings; use strict; use List::Util 'shuffle'; use Test::More tests=>1; my @expect = qw/ frooc froof froz ebaz eyun ebxu quz qun que /; my @input = shuffle @expect; sub special { # sort routine used below $$a[1] <=> $$b[1] # first sort by categories or # then sort by the desired condition per category $$a[1]==1 ? $$a[0] cmp $$b[0] # category 1 (/r/) : ( $$a[1]==2 ? $$a[2] cmp $$b[2] # category 2 (/^e/) : $$b[3] cmp $$a[3] ) # category 3 (rest) } my @output = # sort with Schwartzian transform # Step 3: get back the orig_str out of the array map { $$_[0] } # Step 2: sort with the "special" function defined above sort special # Step 1: each item -> [orig_str, category, third_chr, last_chr] # where category is: /r/ => 1, /^e/ => 2, rest => 3 map { [$_, /r/?1:(/^e/?2:3), substr($_,2,1), substr($_,-1,1)] } @input; is_deeply \@output, \@expect or diag explain \@output;
Re: How to perform different sorts on multiple sections of items in the same array
by BrowserUk (Patriarch) on Dec 29, 2014 at 13:36 UTC
    How do I “pick and choose” which items in the array I will sort, and make sure I leave the other items alone?

    The trick here is to extract a list of the indices of the required subsets; sort those indices according to the appropriate criteria; and then build the sorted array by slicing the sorted array by the subset indices in their original order and the unsorted array by the subset indices in their sorted order.

    To illustrate:

    data: iOrig iSort 0: e20 0 4 1: b17 2: e04 2 2 3: r16 4: e01 4 0 5: b12 6: e99 6 6 @sorted[ @iOrig ] = @data[ @iSort ]

    Thus, the subset from the original data is assigned in their sorted order to the sorted dataset overlaying their original positions; but reordered.

    A simple example coded for clarity with lots of scope for optimisation:

    #! perl -slw use strict; my @data = qw[ alpha bravo charlie delta east echo exit foxtrot golf hotel india +juliet kilo lima mike november oscar papa quebec romeo sierra tango uniform victor whisk +ey xray yankee zulu ]; my @sortedData; my @rindex = grep{ $data[ $_ ] =~ m[r] } 0 .. $#data; my @rsorted = sort{ $data[ $a ] cmp $data[ $b ] } @rindex; @sortedData[ @rindex ] = @data[ @rsorted ]; my @eindex = grep{ $data[ $_ ] !~ m[r] && $data[ $_ ] =~ m[^e] } 0 .. +$#data; my @esorted = sort{ substr( $data[ $a ], 2, 1 ) cmp substr( $data[ $b +], 2, 1 ) } @eindex; @sortedData[ @eindex ] = @data[ @esorted ]; my @oindex = grep{ $data[ $_ ] !~ m[r] && $data[ $_ ] !~ m[^e] } 0 .. +$#data; my @osorted = sort{ substr( $data[ $b ], -1 ) cmp substr( $data[ $a ], + -1 ) } @oindex; @sortedData[ @oindex ] = @data[ @osorted ]; print for @sortedData; __END__ C:\test>1111596.pl whiskey bravo charlie zulu echo exit east foxtrot juliet kilo tango hotel golf mike yankee november oscar quebec alpha romeo sierra delta uniform victor india xray lima papa

    Notes:

    • bravo and charlie haven't moved because they both contain 'r' and were in the required order to start with.
    • east, echo & exit have remain in the same range of positions 4..6, but are reordered according to their 3rd characters 'h', 'i' & 's' respectively.
    • whiskey is first because it: doesn't start with 'e', nor contain 'r', and ends in 'y'.
    • papa is last because it: doesn't start with 'e', nor contain 'r', and ends in 'a'.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Updated: added missing declaration per trippledubs's report below.

      A somewhat more optimal version that avoids multiple greps and re-testing of some conditions:

      #! perl -slw use strict; my @data = qw[ alpha bravo charlie delta east echo exit foxtrot golf hotel india +juliet kilo lima mike november oscar papa quebec romeo sierra tango uniform victor whisk +ey xray yankee zulu ]; my @sortedData; my( @rindex, @eindex, @oindex ); push @{ $data[ $_ ] =~ m[r] ? \@rindex : $data[ $_ ] =~ m[^e] ? \@eind +ex : \@oindex }, $_ for 0 .. $#data; my @rsorted = sort{ $data[ $a ] cmp $data[ $b +] } @rindex; my @esorted = sort{ substr( $data[ $a ], 2, 1 ) cmp substr( $data[ $b +], 2, 1 ) } @eindex; my @osorted = sort{ substr( $data[ $b ], -1 ) cmp substr( $data[ $a +], -1 ) } @oindex; @sortedData[ @rindex, @eindex, @oindex ] = @data[ @rsorted, @esorted, +@osorted ]; print for @sortedData;

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Whaaaaaat. This can't work under strict. How is this magic of taking an array like \@rindex before it is defined. Is it being defined in the push statement? I don't understand how that works. Under perltidy it kind of makes more sense, it looks like the ternary operator is used as a branching statement. But what array is it pushing to?.. ohh okay it is pushing to either rindex, eindex, or oindex depending on what match is found.. if that is what is going on, that is really sweet. Is that what is going on here? I printed this out and I'm just staring at it.

        is it referencing and dereferencing in the same line.. why is that necessary?

Re: How to perform different sorts on multiple sections of items in the same array
by Laurent_R (Canon) on Dec 29, 2014 at 10:42 UTC
    I agree with Athanasius that it is probably premature to provide sample code when the specification is so vague.
Re: How to perform different sorts on multiple sections of items in the same array
by james28909 (Deacon) on Dec 29, 2014 at 03:15 UTC
    Maybe this is what you mean?
    use strict; use warnings; my @array = qw(toad frog frig frag crow creep hop); my @sorted; for (@array) { if ( $_ =~ m/.*r/g ) { push( @sorted, $_ ); } } print "$_\n" for sort(@sorted);
    Output:
    creep crow frag frig frog
    EDIT: Missed the point of this might being a homework assignment. apologies

      The .* is not required in your match expression nor is the /g. The whole loop can be replaced by:

      my @selected = grep {/r/} @array;

      Note that neither version of the matching code actually performs a sort so the variable name is more appropriately 'selected' than 'sorted'.

      Perl is the programming world's equivalent of English
Re: How to perform different sorts on multiple sections of items in the same array
by Anonymous Monk on Dec 29, 2014 at 07:14 UTC
    That's a fun homework. Here's a stable sort (groups maintain their positions relative to each other). That is, given the input
    xxxxxC ABrrrr e_3_ee xxxxxA AArrrr e_2_ee xxxxxB ACrrrr e_1_ee
    the output is
    xxxxxC AArrrr e_1_ee xxxxxB ABrrrr e_2_ee xxxxxA ACrrrr e_3_ee
    Another variation of a Schwartzian transform. sort doesn't sort arrays in place (doesn't sort arrays at all, only lists), so we sort each group separately and use slice assignement to restore order.
Re: How to perform different sorts on multiple sections of items in the same array
by johngg (Canon) on Dec 30, 2014 at 01:03 UTC

    Rather than complicated sorting routines, perhaps it would be preferable to set up a couple of "flags" depending on what type the string is and, consequently, what character to sort on then pack them along with the original string to do a GRT sort. This will not be stable for the last two categories if the strings share the same sorting character, at which point they will be sorted in ascending alpha.

    use strict; use warnings; use 5.014; my @strings = qw{ cabin prawn pail error reptile engine copy echo soap eskimo carpet emblem skunk }; say for map { substr $_, 8 } sort map { my( $type, $ord ); if ( m{r} ) { $type = 0; $ord = 0; } elsif ( m{^e} ) { $type = 1; $ord = ord substr $_, 2, 1; } else { $type = 2; $ord = 255 - ord substr $_, -1, 1; } pack q{NNa*}, $type, $ord, $_; } @strings;

    The output.

    carpet error prawn reptile emblem engine echo eskimo copy soap cabin pail skunk

    I hope this is of interest.

    Update: Corrected statement re. stable sorting, s/first/last/

    Cheers,

    JohnGG

      For your sample data, the OPs specifications, should lead to an ordering of:

      copy carpet soap error prawn emblem cabin echo pail engine reptile esk +imo skunk

      Not:

      carpet error prawn reptile emblem engine echo eskimo copy soap cabin p +ail skunk

      Consider the OPs 3-part specification and the following:

      orig pre-r sort-r pre-e sort-e pre-o sort-O + final cabin cabin copy + copy prawn prawn carpet + carpet pail pail soap + soap error error error + error reptile reptile prawn + prawn engine engine emblem + emblem copy copy cabin + cabin echo echo echo + echo soap soap pail + pail eskimo eskimo engine + engine carpet carpet reptile + reptile emblem emblem eskimo + eskimo skunk skunk skunk + skunk

      The problem with your algorithm is, that is doesn't exclude those items that match the first criteria, from being considered by the second criteria, if the also match it; but the OPs specifications (emphasis added):

      all the strings inside the array that contain an “r” and sort those in ascending alphabetical order. For the elements that are left, find the ones that start with an “e” and sort those by ascii value of their third character. And for the ones that are left reverse sort by their last letter alphabetically.

      Your algorithm does not exclude items that contain an 'r' from being considered when processing items that start with an 'e'. Eg. 'error'.

      Nor does it exclude items that contain an 'r', or start with an 'e', from being processed when reverse sorting "the ones that are left".


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Perhaps I misinterpreted the OP's spec. I thought the requirement was to sort all those containing an 'r' followed by those starting with an 'e' and finally the remainder but I can see that your interpretation of leaving each set in their relative positions could be more accurate. Perhaps estreb could clarify?

        Your algorithm does not exclude items that contain an 'r' from being considered when processing items that start with an 'e'. Eg. 'error'.

        Nor does it exclude items that contain an 'r', or start with an 'e', from being processed when reverse sorting "the ones that are left".

        I'm not sure you are correct in those statements. The if ( m{r} ) { ... } elsif ( m{^e} ) { ... } else { ... } section assigns ascending values to $type which is packed as the first term and becomes the primary key for the sort. Thus all those starting with an 'r' are sorted separately from those starting with 'e' which are again separate from the remainder. I think this is demonstrated by the output.

        carpet } error } Strings containing an 'r' anywhere prawn } are sorted ascending alpha reptile } emblem } engine } Strings starting with 'e' but not echo } containing an 'r' are sorted ascending eskimo } alpha on the third character copy } soap } Strings neither containing an 'r' nor cabin } starting with 'e' are sorted descending pail } alpha on the last character skunk }

        Please let me know if you still think I am having a senior moment and have got this aspect of the algorithm completely wrong.

        Update: Corrected wrong variable in explanation, s/$ord/$type/

        Cheers,

        JohnGG

      To add stable sorting for the last two categories, restrict the sort to $type and $ord (packed as NN so taking 8 bytes) unless the $type is 0 for both terms, in which case sort the entire strings.

      use strict; use warnings; use 5.014; my @strings = qw{ cabin prawn eminent pail error reptile engine copy echo soap eskimo carpet elias emblem skunk cheap }; say for map { substr $_, 8 } sort { unpack( q{N}, $a ) || unpack( q{N}, $b ) ? substr( $a, 0, 8 ) cmp substr( $b, 0, 8 ) : $a cmp $b; } map { my( $type, $ord ); if ( m{r} ) { $type = 0; $ord = 0; } elsif ( m{^e} ) { $type = 1; $ord = ord substr $_, 2, 1; } else { $type = 2; $ord = 255 - ord substr $_, -1, 1; } pack q{NNa*}, $type, $ord, $_; } @strings;

      The output, note that 'eminent' and 'elias' have the same 3rd character but now retain their relative order, as do 'soap' and 'cheap' with matching final characters in the last category.

      carpet error prawn reptile emblem engine echo eminent elias eskimo copy soap cheap cabin pail skunk

      I hope this is of interest.

      Cheers,

      JohnGG

Re: How to perform different sorts on multiple sections of items in the same array
by sundialsvc4 (Abbot) on Dec 29, 2014 at 12:41 UTC

    Very-elaborate sorting requirements such as these can be satisfied by creating an appropriate function for use with the sort verb.   As you know, this function must return a value that is either less-than, equal-to, or greater-than zero, based on its comparison of two strings ($a and $b) that are presented to it.   Typically, this is an “in-line” function such as { $a <=> $b } (using an operator that is expressly designed for this purpose), but it does not have to be.   You can write an entirely separate sub and refer to it within the sort verb, and this is what I suggest that you do.

    Let me describe my notion just in words.   This is what the sub will do, each time it is called.

    1. First, classify the two strings into “groups.”
      1. Group #1 consists of those which contain an “r.”
      2. Group #2 consists of those characters which start with “e.”
      3. Group #3 consists of those strings which do not fall into either of the other two groups.
    2. If the group-number of the two strings is unequal, return the difference between the two group-numbers.   (Only the sign of the result matters.)   This will cause the groups to be separated:   the groups will occur in the final output in order of group-number, and, within their respective groups, they will be sorted as (see below) each group requires.
    3. Otherwise (the group-number is the same ...), return a value that is appropriate for each group.
      1. Group #1 can simply use <=>.
      2. Group #2 can (probably) remove the first characters, then use that operator, or use it on a substring.   (You probably need to consider more than just the one character.   Otherwise, multiple strings with the same fourth-character would have an unpredictable result-order, since differing strings would all be seen as “equal.”)
      3. The third, to achieve a reverse-sort, would apply the <=> operator with the two operands reversed.

    Notice that you are not splitting the array and sorting each piece.   You are not using memory-hungry exotica like Schwartz.   This notion can be applied to sorts of any size, and it applies equally well to both in-memory and disk-based sorts.

    Q.E.D.

      Group #1 can simply use <=>

      If you're going to describe someone else's solution without giving credit, at least get it right. <=> does numeric comparison.

      You probably need to consider more than just the one character. Otherwise, multiple strings with the same fourth-character would have an unpredictable result-order, since differing strings would all be seen as “equal.”
      You have to upgrade your Perl!

      sort

      In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm
      You are not using memory-hungry exotica like Schwartz.
      It is precisely when the comparison routine is complicated and likely to be time-expensive that the Schwartzian transform gives the best performance improvement.

        The statement about <=> vs. the string version, cmp, is of course my blunder.   Oops.

        The idea itself is certainly not new:   that’s how they did it (and, still do it ...) in the earliest COBOL days.   But it most certainly isn’t “plagarism,” as if such a curious term could actually apply in this context ... which it obviously can’t.   (In general, several of you routinely step over the line of decorum and civility . . . and I’ve simply given up asking you to stop.   But, I digress.)

        A lot depends on exactly how much data we might be dealing with.   If the whole shebang fits easily within available memory, then it matters very little how you do it:   split the in-memory list into three in-memory lists, sort each one, splice ’em all together into one, and you’re done.   Or, pay your nod to Mr. Schwartz, if you prefer.   No discussion required.   But, if data volumes are large ... especially if they are very large ... custom sort-comparison routines work extremely well.

        I decided to make this suggestion ... not to hear myself babble, nor with the slightest concern for “votes” ... but to offer an alternative strategy.   “There’s more than one way to do it,™” and here’s another.   Yes, it involves a non-trivial subroutine.   But, so do many sorts.   Even if it is deemed not-useful in this particular (small?) case, it is a useful alternative to keep in mind.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1111596]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-19 06:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found