http://www.perlmonks.org?node_id=966478

biohisham has asked for the wisdom of the Perl Monks concerning the following question:

Well, I guess I got away with this in some magical way I can't fathom in a Perl 5.12.4, I confess I was being abusive when tried to numerically compare some group of character strings which contained a number with the purpose of sorting them, the guilty feeling of seeing the warnings is irksome. I have a list of files that I want to sort orderly based on the number in their names and I thought I will achieve that through a Schwartzian transform. My files have the format of 'sequence<n>.gb.txt' where <n> is any number.

What my code does is that it goes around the directory picking these file names and feed that into an array, even though the files are arranged in the directory they are not in that array, so doing @sorted = map{$_->[0]} sort{$a->[2]<=>$b->[2]} map{[$_,split/sequence/]} @unsorted was my option, trying various combinations to split finally landed me in the direction (I tried splittig around /./ or /\d+/..etc). It is clear that sort() is so generous, I tried cmp (just to test what the output looks like). The code sorts @unsorted and yet complains of 'arguments being not numeric in numeric comparison (<=>)' blah blah

So Perl's sort() gracefully understood what I mean yet I got forgiving-ly pinched,I wonder as to how I can best evade introducing such warnings (going "no warnings" of course is not an option for me;)), any ideas?

use strict; use warnings; my @unsorted; my @sorted; while(my $file = <DATA>){ chomp $file; push @unsorted, $file; } @sorted = map{$_->[0]} sort{$a->[2] <=> $b->[2]} map{[$_, split/sequen +ce/]} @unsorted; print join("\n",@sorted); __DATA__ sequence3.gb.txt sequence1.gb.txt sequence7.gb.txt sequence5.gb.txt sequence2.gb.txt sequence4.gb.txt sequence10.gb.txt sequence9.gb.txt sequence8.gb.txt
##OUTPUT## Argument "1.gb.txt" isn't numeric in numeric comparison (<=>) at SortQ +uestion.pl line 11, <DATA> line 9. ... ... sequence1.gb.txt sequence2.gb.txt sequence3.gb.txt sequence4.gb.txt sequence5.gb.txt sequence7.gb.txt sequence8.gb.txt sequence9.gb.txt
UPDATE:Apparently the powers of a Schwartzian ensemble are so crazy


David R. Gergen said "We know that second terms have historically been marred by hubris and by scandal." and I am a two y.o. monk today :D, June,12th, 2011...

Replies are listed 'Best First'.
Re: Schwartzian transform deformed with impunity
by moritz (Cardinal) on Apr 22, 2012 at 16:39 UTC
Re: Schwartzian transform deformed with impunity
by jwkrahn (Abbot) on Apr 22, 2012 at 23:16 UTC
    use strict; use warnings; chomp( my @unsorted = <DATA> ); my @sorted = map unpack( 'x4a*', $_ ), sort map pack( 'Na*', /(\d+)\D* +\z/, $_ ), @unsorted; print map "$_\n", @sorted; __DATA__ sequence3.gb.txt sequence1.gb.txt sequence7.gb.txt sequence5.gb.txt sequence2.gb.txt sequence4.gb.txt sequence10.gb.txt sequence9.gb.txt sequence8.gb.txt
Re: Schwartzian transform deformed with impunity
by dave_the_m (Monsignor) on Apr 22, 2012 at 19:27 UTC
    If the file format really is as fixed as you describe, with only the number component varying, then you could strip off everything except the number, sort the numbers, then print out or assign to an array while reconstructing the file name from the number:
    use strict; use warnings; print map "sequence$_.gb.txt\n", sort { $a <=> $b } map { /(\d+)/; $1 } <DATA>; __DATA__ sequence3.gb.txt ....

    Dave.

      map { /(\d+)/; $1 }

      No, that is wrong.    If /(\d+)/ doesn't match then $1 will not contain valid data:

      $ perl -le' use Data::Dumper; my @y = map { /(\d+)/; $1 } qw/ ab123cd ab456cd abcdefg ab789cd /; print Dumper \@y; ' $VAR1 = [ '123', '456', undef, '789' ];

      Just use:

      $ perl -le' use Data::Dumper; my @y = map /(\d+)/, qw/ ab123cd ab456cd abcdefg ab789cd /; print Dumper \@y; ' $VAR1 = [ '123', '456', '789' ];

      The regular expression by itself will just do the right thing.

Re: Schwartzian transform deformed with impunity
by salva (Canon) on Apr 23, 2012 at 08:58 UTC
    going "no warnings" of course is not an option for me

    Why not? there is nothing wrong in disabling them, at least if you know why they are happening:

    @sorted = map{ $_->[0]} sort{ no warnings 'numeric'; $a->[2] <=> $b->[2] } map{[$_, split/sequence/]} @unsorted;

    However, for this particular case and as already stated in other monks answers, it is easier to extract just the number.

    Besides that, if you are concerned about the sort performance, you should try Sort::Key and Sort::Key::Radix:

    use Sort::Key::Radix 'ukeysort'; my @sorted = ukeysort { /(\d+)/; $1 } @unsorted;