Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Alpha number sort

by gam3 (Curate)
on Mar 25, 2005 at 00:45 UTC ( #442237=perlquestion: print w/replies, xml ) Need Help??

gam3 has asked for the wisdom of the Perl Monks concerning the following question:

I would like a sort that sorts items in a more reasonable way. That is a1a should come before a10a and the releases of Linux should be in release order. release 2.6.10 should not come before 2.6.9. Also 1.55 should come before 1.6. I have a very ugly solution for this problem, but would like some better ones. I am so sure that my opinion of what is reasonable is not as reasonable as it could be.
#!/usr/bin/perl use strict; our @out; our @data; while (<DATA>) { chomp; push @data, $_ if ($_); } our $srtre = qr/^(?:(\d+(?:\.\d+)*)?|\d*)(\D*)(.*)$/; @out = sort { my $x = $a; my $y = $b; my $ret = 0; while (!($x eq $y)) { my ($xx, $yy); my ($l, $m, $xx, $yy); ($l, $xx, $x) = ($x =~ $srtre); ($m, $yy, $y) = ($y =~ $srtre); if ($l eq $m) { # the common case if ($ret = $xx cmp $yy) { last; } } else { if (length($l) && length($m)) { if ($l =~ /\d+\.\d+\.d+/ || $m =~ /\d+\.\d+\.\d+/) { my @ll = split(/\./, $l); my @lm = split(/\./, $m); if ($#ll == $#lm) { for (my $x = 0; $x <= $#ll; $x++) { if ($ret = ($ll[$x] <=> $lm[$x])) { last; } } } elsif ($#ll < $#lm) { for (my $x = 0; $x <= $#ll; $x++) { if ($ret = ($ll[$x] <=> $lm[$x])) { last; } } $ret = -1 unless ($ret); } else { for (my $x = 0; $x <= $#lm; $x++) { if ($ret = ($ll[$x] <=> $lm[$x])) { last; } } $ret = 1 unless ($ret); } last if $ret; } elsif ($l == $m) { # handle leading zeros. # don't want (00 0 000) if ($xx . $x eq $yy . $y) { if ($ret = ($l cmp $m)) { last; } } if ($ret = ($xx cmp $yy)) { last; } } elsif ($ret = ($l <=> $m)) { last; } } else { if ($ret = ($l.$xx cmp $m.$yy)) { last; } } } } return $ret; } @data; print join(', ', @out), "\n"; __END__ a1.5 a1.5b a1.55 a1.55b a1.6 a1.6b linux-2.4.3.tar linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.10.6.tar linux-2.10.50.tar
The DATA section is in 'my' prefered sorted order.
-- gam3
A picture is worth a thousand words, but takes 200K.

Replies are listed 'Best First'.
Re: Alpha number sort
by cog (Parson) on Mar 25, 2005 at 00:48 UTC

      Indeed:

      use strict; use Sort::Naturally; print for nsort( <DATA> ); __DATA__ a1.5 a1.55 a1.55b a1.5b a1.6 a1.6b linux-2.10.50.tar linux-2.10.6.tar linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.4.3.tar

      % perl 442238.pl a1.5 a1.5b a1.6 a1.6b a1.55 a1.55b linux-2.4.3.tar linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.10.6.tar linux-2.10.50.tar

      the lowliest monk

Re: Alpha number sort (one, true, natural sort)
by tye (Sage) on Mar 25, 2005 at 05:09 UTC

    A simple search here for natural sort turns up lots of techniques. Here is a rewrite of my favorite using the "one true sort" (fast, flexible, stable sort).

    #!/usr/bin/perl -w use strict; my @list= <DATA>; my @sorted= @list[ map { unpack "N", substr($_,-4) } sort map { my $key= $list[$_]; $key =~ s[(\d+)][ pack "N", $1 ]ge; $key . pack "N", $_ } 0..$#list ]; print @sorted; __END__ a1.5 a1.5b a1.55 a1.55b a1.6 a1.6b linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.4.3.tar linux-2.10.6.tar linux-2.10.50.tar

    Producing

    a1.5 a1.5b a1.6 a1.6b a1.55 a1.55b linux-2.4.3.tar linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.10.6.tar linux-2.10.50.tar

    - tye        

Re: Alpha number sort
by BrowserUk (Pope) on Mar 25, 2005 at 01:15 UTC

    Is this the order you require?

    P:\test>442237 a1.5 a1.5b a1.6 a1.6b a1.55 a1.55b linux-2.4.3.tar linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.10.6.tar linux-2.10.50.tar

    If so:

    #! perl -slw use strict; print map{ $_->[0] } sort { $a->[1] cmp $b->[1] } map { ( my $vstring = $_ ) =~ s[(\d+)][chr $1]ge; [ $_, $vstring ] } <DATA>; __DATA__ a1.5 a1.55 a1.55b a1.5b a1.6 a1.6b linux-2.10.50.tar linux-2.10.6.tar linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.4.3.tar

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?
Re: Alpha number sort
by Crackers2 (Parson) on Mar 25, 2005 at 15:51 UTC

    I seems to me the preferred sort order in your example is different from what you describe in the text, and both text and example appear somewhat inconsistent.

    For example, you say you want linux releases in release order, yet you put linux-2.4.3 after linux-2.4.28. Same thing with 2.10.6 vs 2.10.50

    The core inconsistency seems to be this part of your request: "x.10 should come after x.9, but x.55 should come before x.6".

    Perhaps (since this seems to be about comparing release versions of software packages) you need to find a way to use a different compare function for different packages, and the n either use a lookup table or a heuristic to figure out which one to use.

      Thank you for pointing out the problem in the data section.

      As for the inconsistency, there is not one because I am saying that \d+\.\d+ is a floating point number but that \d+\.\d+\. is not.

      I agree that this will cause some version numbering schemes not to work, or for floating point numbers not to work if followed by a decimal number. But then if it was easy I would not need to post to perlmonks would I.

      -- gam3
      A picture is worth a thousand words, but takes 200K.

        Ah. It appears that everyone (including me) missed this.

        #!/usr/bin/perl -w use strict; my @list= <DATA>; my @sorted= @list[ map { unpack "N", substr($_,-4) } sort map { my $key= $list[$_]; $key =~ s[((?<!\.)(\d+)\.\d+(?!\.)|\d+)][ my $len= length( defined($2) ? $2 : $1 ); pack( "N", $len ) . $1 . ' '; ]ge; $key . pack "N", $_ } 0..$#list ]; print @sorted; __END__ a1.5 a1.5b a1.55 a1.55b a1.6 a1.6b linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.4.3.tar linux-2.10.6.tar linux-2.10.50.tar

        Produces

        a1.5 a1.5b a1.55 a1.55b a1.6 a1.6b linux-2.4.3.tar linux-2.4.28.tar linux-2.4.29.tar linux-2.4.29a.tar linux-2.10.6.tar linux-2.10.50.tar

        Update: Third-time lucky. The original regexes are below in HTML comments.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://442237]
Approved by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2021-06-16 12:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (74 votes). Check out past polls.

    Notices?