Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Array Sort

by gopalr (Priest)
on May 10, 2011 at 02:10 UTC ( [id://903885]=perlquestion: print w/replies, xml ) Need Help??

gopalr has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need your help to sort the array content.

The array has following value and the each value has delimitted by '-'.

@array=( 'AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89-CB' )

I need to sort the array by middle of the value. For example, the array output should be the following

@array=( 'AA-BA89-CB' 'AD-BA90-CC', 'AC-BA91-CA', 'AB-BA92-CA', 'AA-BA93-CA' )

I tried to sort by using the cmp and <=>, but I need to sort middle of the value.

Please guide me.

Thanks!!!

Replies are listed 'Best First'.
Re: Array Sort
by BrowserUk (Patriarch) on May 10, 2011 at 02:17 UTC

    print for sort{ substr( $a, 3, 4 ) cmp substr( $b, 3, 4 ) } 'AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89-CB' +;; AA-BA89-CB AD-BA90-CC AC-BA91-CA AB-BA92-CA AA-BA93-CA [0] Perl>

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Array Sort
by GrandFather (Saint) on May 10, 2011 at 04:22 UTC

    For small amounts of data and with a simple way of extracting the key string the technique BrowerUK suggests is a good solution. If finding the key is expensive or the amount of data being sorted is large such that the sort processing takes a long time a common technique is the schwartzian transform. Consider:

    #!/usr/local/bin/perl use strict; use warnings; my @data = ('AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89- +CB'); my @sorted = map {$_->[0]} sort {$a->[2] cmp $b->[2]} map {[$_, split '-']} @da +ta; print "$_\n" for @sorted;

    Prints:

    AA-BA89-CB AD-BA90-CC AC-BA91-CA AB-BA92-CA AA-BA93-CA

    The key to the process is to transform the data into a form that can be sorted, sort it, then transform it back again. The process really runs from right to left with the right most map generating a temporary transformed list (with the original data as the first element of each list item. The sort is in the middle. Then the left most map pulls the original string out of each item in the sorted list.

    True laziness is hard work
      Not disagreeing with the strategy, however if there is a large quantity of data, shouldn't you restrict the sort element of the array to contain only the element to be sorted, as far as I can see it adds no overhead (we already have a list from the split function) and reduces the memory usage. ie
      #!/usr/local/bin/perl use strict; use warnings; my @data = ('AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89- +CB'); my @sorted = map {$_->[0]} sort {$a->[1] cmp $b->[1]} map {[ $_ , (split '-')[1 +] ]} @data; print "$_\n" for @sorted;
      Just wondering, Utilitarian.

      print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

        Sure. Storing just the original string and sort key is a reasonable idea if the data is very large. The little bit of extra "clutter" added was better left out of the original example though, especially as the OP may actually be working with quite different data and have different requirements for extracting the sort key than was suggested in the OP.

        True laziness is hard work

      It is worth noting that for the complication of the ST to save the OP any time over the standard sort, he would have to be sorting at least a 1 million elements.

      And if he is sorting that much data, then he'll need every gain he can get, in which case the GRT suggested by javafan is far more effective. And it is simpler than the ST.

      All in all, I wonder if there is ever a situation where the ST makes sense?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        All in all, I wonder if there is ever a situation where the ST makes sense?

        Over plain sort, sure. Over GRT? Not so much.

        ST can return objects, whereas GRT returns strings unless an external array is used.

        ST can produce simpler and more readable code than GRT in some circumstances.

        However, ST isn't as fast as GRT.

        # ST my @fhs = map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, fileno($_) ], get_fhs();
        # GRT my @fhs = get_fhs(); @fhs = map $fhs[ unpack('x4 N', $_) ] sort map pack('NN', fileno($fhs[$_]), $_), 0..$#fhs;
        # GRT my @fhs = get_fhs(); @fhs = @fhs[ map unpack('x4 N', $_), sort map pack('NN', fileno($fhs[$_]), $_), 0..$#fhs ];
      I think you might find this application of ST will actually slow down the sort. As you said, it helps if the key is expensive to computer.

      You also said ST is more likely to help for long lists, but that's not true. It loses effectiveness as the list grows.

Re: Array Sort
by JavaFan (Canon) on May 10, 2011 at 06:30 UTC
    A GRT:
    @array = map {substr $_, 4} sort map {substr($_, 3, 4) . $_} @array
    This assumes the middle is 4 characters long, and always starts after the 3rd character.

      Interesting to note that you need to be sorting thousands of items before the GRT starts to show benefit over a plain sort.

      It's better than the ST though, which looks to require millions before it gains anything.

      #! perl -slw use strict; use Benchmark qw[ cmpthese ]; my @prefixes = 'AA' .. 'ZZ'; my @suffixes = '00' .. '99'; our $N //= 10; our @data = map{ 'AA-' . $prefixes[ rand @prefixes ] . $suffixes[ rand @suffixes ] . '-XX' } 1 .. $N; cmpthese -3, { buk => q[ my @sorted = sort{ substr( $a, 3, 4, ) cmp substr( $b, 3, 4 ) } @data ], GF => q[ my @sorted = map { $_->[0] } sort { $a->[2] cmp $b->[2] } map { [$_, split '-'] } @data ], jfn => q[ my @sorted = map { substr $_, 4 } sort map { substr($_, 3, 4) . $_ } @data ], }; __END__ c:\test>903885 Rate GF jfn buk GF 18332/s -- -61% -79% jfn 46452/s 153% -- -47% buk 88418/s 382% 90% -- c:\test>903885 -N=100 Rate GF jfn buk GF 1584/s -- -64% -67% jfn 4394/s 177% -- -8% buk 4757/s 200% 8% -- c:\test>903885 -N=1000 Rate GF buk jfn GF 118/s -- -64% -69% buk 333/s 181% -- -14% jfn 387/s 227% 16% -- c:\test>903885 -N=10000 Rate GF buk jfn GF 8.86/s -- -61% -76% buk 22.8/s 157% -- -38% jfn 36.7/s 314% 61% -- c:\test>903885 -N=100000 (warning: too few ite Rate GF buk jfn GF 0.707/s -- -60% -75% buk 1.75/s 147% -- -38% jfn 2.80/s 296% 60% --

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Array Sort
by kejohm (Hermit) on May 10, 2011 at 02:35 UTC
Re: Array Sort
by anonymized user 468275 (Curate) on May 10, 2011 at 10:12 UTC
    An alternative optimising solution to the Schwartzian transform suggested by grandfather is the "Orcish Manoeuvre", which also ensures that the operation evaluating the sort key is only applied once per entry, while using the same hash overhead, but often slightly less processing:
    my @sorted; { local %_ = (); @sorted = sort { $_{ $a } ||= substr( $a, 3, 4 ) cmp $_{ $b } ||= substr( $b, 3, 4 ) } @array; }
    BUT, in most cases, I tend to simply pass through the data once creating the inverse hash to that:
    my %hash = map { substr( $_, 3, 4 ), $_ } @array;
    Such a hash, using the key info as a hash key, tends to be more useful than a sorted array of keys in the subsequent processing.

    Note: if the middle field should vary in length, then the expression ( split /\-/ )[1] should replace the substr function.

    One world, one people

Re: Array Sort
by bart (Canon) on May 10, 2011 at 12:19 UTC
    You can pick any of the advanced sorting techiques as described in Uri Guttman + Larry Rosler's paper: A Fresh Look at Efficient Perl Sorting. Just for practice.

    If you don't really care about efficiency, or just to get a feel as what's going on, you can call a function for each items every single time.

    # for example, this'll do what you want sub extract_middle { return shift() =~ /-(.*)-/ ? $1 : '' } @sorted = sort { extract_middle($a) cmp extract_middle($b) } @unsorted +;
    Note that the speedups as described in the paper are achieved by essentially calling this extract function only once for each item.

    update Fixed a bug in the sub, thanks moritz

Re: Array Sort
by Khen1950fx (Canon) on May 10, 2011 at 14:26 UTC
    Sort::Array can do it.
    #!/usr/bin/perl use strict; use warnings; use Sort::Array qw(Sort_Table); use Data::Dumper::Concise; my @data = qw( AC-BA91-CA AB-BA92-CA AD-BA90-CC AA-BA93-CA AA-BA89-CB ); warn Dumper @data = Sort_Table( cols => '3', field => '2', sorting => 'ascending', structure => 'csv', separator => '\-', data => \@data, );

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://903885]
Approved by lidden
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-29 06:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found