http://www.perlmonks.org?node_id=1039045

rkk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I need some suggestion regarding sorting an array.

Ex:

@arr=(100,20,3,45,1); #To sort the above array I did @index=sort{ $arr2[$a] <=> $arr2[$b] } 0 .. $#arr; print @index; Output: 4 2 1 3 0

But, if I have an array like the following

@arr=((ch1,100), (ch2,20), (ch1,13), (ch2,45), (ch1,1));

How to sort this kind of array (first sort first element and then sort by second element)

Sample output should be,

4 0 2 1 3

Any suggestions, please advise.

Thanks,

Replies are listed 'Best First'.
Re: Sorting array
by frozenwithjoy (Priest) on Jun 14, 2013 at 22:35 UTC
    # changed your array to an array of arrays: my @arr = ( [ 'ch1', 100 ], [ 'ch2', 20 ], [ 'ch1', 13 ], [ 'ch2', 45 +], [ 'ch1', 1 ] ); # sort by chromosome then by position: my @sorted = sort { $a->[0] cmp $b->[0] or $a->[1] <=> $b->[1] } @arr; use Data::Printer; p @sorted; __END__ [ [0] [ [0] "ch1", [1] 1 ], [1] [ [0] "ch1", [1] 13 ], [2] [ [0] "ch1", [1] 100 ], [3] [ [0] "ch2", [1] 20 ], [4] [ [0] "ch2", [1] 45 ] ]
      That doesn't quite give the output the OP requested, and will fail for multi-digit "ch.." values.

      If the entries always start with "ch", this code should work:

      perl -e 'my @arr = ( [ ch1=> 100 ], [ ch2=> 20 ], [ ch11=> 13 ], [ ch2 +=> 45 ], [ ch1=> 1 ] ); @s= sort( { substr($arr[$a]->[0],2) <=> sub +str($arr[$b]->[0],2) or $arr[$a]->[1] <=> $arr[$b]->[1]} 0..$#arr);pr +int qq|@s\n|' #Output: #4 0 1 3 2
      Note - OP's data has been modified slightly - adding a 2-digit "ch" value ("ch11").

      Also - this is not the most efficient sort for this kind of data, but will work well for small to medium size data. More complex transforms are required for efficiency with large amounts of data (For some definition of 'large').

                   "The trouble with the Internet is that it's replacing masturbation as a leisure activity."
              -- Patrick Murray

        I thought about that; however, all current genome annotation (that I've seen) has 0-padded #s such that if there are > 9 chromosomes, the first 9 would be ch01, ch02, etc. In the 'real world', it is much more likely to have zero-padding than to have chromosomes names 'ch'. But I agree, better safe than sorry.

        I am getting some error..BTW I have created the above mentioned @arr using following command

        push(@arr,($l2[6],$l2[7]));

        But you you specified

        my @arr = ( [ ch1=> 100 ], [ ch2=> 20 ], [ ch11=> 13 ], [ ch2 => 45 ], + [ ch1=> 1 ] );

        please advise changes in my command to get a format like you specified. Thanks in advance..

        Thanks much..This is what I needed..

Re: Sorting array
by Laurent_R (Canon) on Jun 15, 2013 at 22:03 UTC

    I am surprised that nobody yet has suggested that you take a look at something relatively famous called the 'Schwartzian Transform', a quite powerful algorithm for sorting slightly complicated data such as the one you have, especially if you have a lot of data (where is is much faster, because it avoids duplicating some part of processing many times), but also quite practical in my view even if your data is quite small. It is sufficiently well known to have an article in Wikipedia (in many languages) and pages in many thousands of other sites. Just google it, or, if you prefer or are not completely convinced that Google is your friend these days, go to some initial pointers on it on Perlmonks:

    What is "Schwarzian Transform" (aka Schwartzian)

    Schwartzian Transform

    Sorting dates with the Schwartzian Transform