Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Custom Sort Array

by omegaweaponZ (Beadle)
on Aug 10, 2013 at 19:26 UTC ( #1048933=perlquestion: print w/replies, xml ) Need Help??

omegaweaponZ has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking to custom sort an array gathered from a Net::FTP listing of a directory of logs. Logs are entered by thisisalog-#, where # is 0,1,2,3. This is fine when sorting by the ->ls function of Net::FTP but when it goes to 10,11, the list will start at 1, then next will be 10, not 2. How can we custom sort the array so instead of:

thisisalog-1
thisisalog-10
thisisalog-11
thisisalog-2
thisisalog-3
etc...

it goes like this

thisisalog-1
thisisalog-2
etc...
thisisalog-10

This is my current snippit of code:
my $f = Net::FTP->new($host) or die "Can't open $host\n"; $f->login($user, $password) or die "Can't log $user in\n"; my @files = $f->ls; foreach (@files) { my $file = $_; print "My file is $file \n"; }

Update - Log format really looks more like this with numbers unfortunantely within as well, just after the dash is it different:

abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz

Update: Laurent_R has used a transform method that takes all the constraints into account and makes this work:

my @files = map { $_->[0] } sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] } map { [ $_, /\.(\d{12}\-\d{4})/, /.+?-(\d+)\..+$/ ] } <DAT +A>; print $_,$/ for @files; </p>

Replies are listed 'Best First'.
Re: Custom Sort Array
by arkturuz (Curate) on Aug 10, 2013 at 19:39 UTC
    Oh, and you can declare your variable in loop. No need for my $file = $_ later:
    sub by_id { # extract numbers my ($A) = $a =~ /(\d+)/; my ($B) = $b =~ /(\d+)/; # return comparison return $A <=> $B; } for my $f (sort by_id @files) { say $f; }

      Hi,
      arkturuz, will even work if the regex matches the values to be sorted like this:

      chomp(my @files = <DATA>); ## added sub by_id { # extract numbers my ($A) = $a =~ /.+?-(\d+)\..+$/; # changed my ($B) = $b =~ /.+?-(\d+)\..+$/; # changed # return comparison return $A <=> $B; } for my $f (sort by_id @files) { print $f,$/; } __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz

      If you tell me, I'll forget.
      If you show me, I'll remember.
      if you involve me, I'll understand.
      --- Author unknown to me
      Thanks, however this seems to return the same array I had previously. 10 is still preceded by 2.

      I should clarify that the logs have numbers prior as well, so the log looks something like this:

      abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
      abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
      abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
      abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
      abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
      abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz

        You changed the requirements but I think you see where is the solution going: extract relevant fields and compare that.
      I think you meant
      my ($A) = $a =~ /(\d+)\.gz$/; my ($B) = $b =~ /(\d+)\.gz$/;
Re: Custom Sort Array
by 2teez (Vicar) on Aug 10, 2013 at 20:00 UTC

    What about trying Schwartzian transform:

    use strict; use warnings; print join $/ => map{$_->[0]} sort{$a->[1] <=> $b->[1]} map{[$_,/.+?-(\d+)\..+$/]}<DATA>; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
    Produces...
    abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      Yes, I believe that is the concept I was looking for, but your example I am still only getting the same array I got before. Am I missing something simplistic? Sorry also, there is a 1, it doesn't just go 0,2,etc. There's a 1, so the 1 shows up before the 10, and 11, then the 2 shows up. This goes well beyond just 1, however. Logs can go as far as 99, so I need a solution that will organize them up to the point of this final -##.gz in the name from 0-99+ in numeric order
      foreach (@files) { my $file = $_; print join $/ => map{$_->[0]} sort{$a->[1] <=> $b->[1]} map{[$_,/.+?-(\d+)\..+$/]}$file; print "\n";

        ...There's a 1, so the 1 shows up before the 10, and 11, then the 2 shows up. This goes well beyond just 1, however. Logs can go as far as 99, so I need a solution that will organize them up to the point of this final -##.gz in the name from 0-99+ in numeric order.

        Yes, even with the 1 file been included. It would sort it for you.
        I think you are getting it wrong the way you apply the solution.
        You might want to do it like this:
        use strict; use warnings; my @files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, /.+?-(\d+)\..+$/ ] } <DATA>; print $_,$/ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
        you will have:
        abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: Custom Sort Array
by Laurent_R (Canon) on Aug 10, 2013 at 21:36 UTC

    You are not saying everything at once, this is the cause of misunderstandings.

    You only need to amend slightly the Schwartzian Transform solution by 2teez to accomodate the new rules.

    use strict; use warnings; my @files = map { $_->[0] } sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]} map { [ $_, /\.(\d{12})\-/, /.+?-(\d+)\..+$/ ] } <DATA>; print $_,$/ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz

    I guess that's what you need.

    Update 21:42 UTC: 2teez was faster than me by about 8 minutes producing a new solution in accordance with the new rules.

    Update 2 22:15 UTC: you had a third additional requirement (also use the 4-digit group after the date for the sort) that I had not seen. I've just posted a newly amended version in my answer below.

      I had not read carefully enough your post where you were also asking the files to be sorted according to the group of four digits after the date. It would be far better if you gave all your sorting rules from the outset.

      This a new amendment to take this rule into account:

      use strict; use warnings; my @files = map { $_->[0] } sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] } map { [ $_, /\.(\d{12}\-\d{4})/, /.+?-(\d+)\..+$/ ] } <DAT +A>; print $_,$/ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz
        This is exactly what I was looking for, thank you!
Re: Custom Sort Array
by Anonymous Monk on Aug 11, 2013 at 17:17 UTC
    Sort::Key::Natural perhaps?

    Anyway, here's my try. I did not check thoroughly, but it seems to work.

    sub natural_sort { my $idx = shift || 0; return -1 if !defined $a->[$idx]; return 1 if !defined $b->[$idx]; if ($a->[$idx] =~ /^[0-9]/ and $b->[$idx] =~ /^[0-9]/) { $a->[$idx] <=> $b->[$idx] or natural_sort($idx + 1); } else { $a->[$idx] cmp $b->[$idx] or natural_sort($idx + 1); } } my @files = <DATA>; print for map { join "", @$_ } sort natural_sort map { [ split /(\d+)/, $_ ] } @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz
Re: Custom Sort Array
by poj (Abbot) on Aug 11, 2013 at 07:04 UTC
    If you have a maximum sequence number of -99 then try this ;
    #!perl use strict; chomp( my @files = <DATA>) ; # add leading zero for 0 to 9 s/-(\d)(\.gz)$/-0$1$2/ for @files; # sort my @sorted = sort @files; # remove leading zero for 00 to 09 s/-0(\d)(\.gz)$/-$1$2/ for @sorted; print "$_\n" for @sorted; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201307280800-0900-0.gz abcd1_abc_123456.abc1a_A.201307280800-0900-10.gz abcd1_abc_123456.abc1a_A.201307280800-0900-11.gz abcd1_abc_123456.abc1a_A.201307280800-0900-2.gz abcd1_abc_123456.abc1a_A.201307280800-0900-3.gz
    poj

      poj's idea of changing the words to enable lexicographical sort can be expressed as a variant of the Schwartzian Transform (ST) called the Guttman Rosler Transform (GRT), which is deemed to be faster than the ST because the sort phase is entirely C code and it avoids one level of indirection. The idea of poj expressed with the GRT construct:

      use strict; use warnings; my @files = map {s/-0(\d)(\.gz)$/-$1$2/; $_} sort map {s/-(\d)(\.gz)$/-0$1$2/; $_} <DATA>; print $_ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz

      For more information on the GRT, see e.g. Advanced Sorting - GRT - Guttman Rosler Transform (this is just one I picked up at random, you have several links in this post to other nodes, and there are many other nodes on the subject)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1048933]
Approved by Old_Gray_Bear
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2020-01-29 21:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?