Zoogie has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to write a sort function that sorts section numbers of a large document into the proper order. I keep having the feeling that's there's really easy way to do this in Perl that I'm missing :)

The section numbers look like "1, 2, 2.2, 2.13, 2.1.7, 3.4a, ...", etc. Neither a regular numeric sort nor a textual sort work, because "2.13" must come after "2.2", "10" must come after "2", and so forth.

I'm having a serious mental block on this one. Any ideas?

Replies are listed 'Best First'.
Re: Sorting on Section Numbers
by tye (Sage) on Jul 28, 2000 at 02:42 UTC

    Pad all numeric parts before sorting:

    my @sects= qw( 1 2 2.2 2.13 2.1.7 3.4a 10.1 10.10 10.1a 1a.2 ); my $maxdigs= 4; my %sects; foreach my $sect ( @sects ) { ( my $sort= $sect ) =~ s/(\d+)/ sprintf "%0$maxdigs.$maxdigs"."d", $1 /ge; $sects{$sort}= $sect; } print join( " ", @sects{ sort keys %sects } ), "\n";
      Good approach! But you could simplify it a lot by noticing that your sprintf only prepends zeros:
      my @sects= qw( 1 2 2.2 2.13 2.1.7 3.4a 10.1 10.10 10.1a 1a.2 ); my %sects; for ( @sects ) { ( my $key = $_ ) =~ s/(\d+)/substr("0000$1",-4)/ge; $sects{$key} = $_; } print "@sects{sort keys %sects}\n";

      Parameterizing on $maxdigs is left an an excercise for the acolyte. :-) (And extra credit if you knew that you could interpolate a hash slice.)

      If we simplify the problem space by eliminating the alphanumerics, things get even neater--we don't even need a hash any more:

      my @sorted = map { join '.', unpack 'N*', $_ } sort map { pack 'N*', split /\./ } @unsorted;

          -- Chip Salzenberg, Free-Floating Agent of Chaos

        You simplified it a lot by dropping $maxdigs. Put that back in and I doubt you'll have saved more than a keystroke or two. :) But thanks for the substr method; I like that.

        substr("0000$1",-4) sprintf"%04.4d",$1 substr("0"x$maxdigs.$1,-$maxdigs) sprintf"%0$maxdigs.$maxdigs"."d",$1

        When I saw the name chip, I wondered if it was you. Welcome to Perl Monks! I've run into your work many times and have been impressed.

        If you know you don't need extra leading zeros, then you can also get away with:

        grep{s/(^|\D)0+(\d)/$1$2/g,1} sort grep{s/(\d+)/sprintf"%06.6d",$1/ge,1} @sects;

        Man, I shouldn't attempt this much thinking before breakfast.

        There is a slight bug in this solution, it drops letters.
        Using the above test data, the following list is generated:
        1, 1.2, 2, 2.1.7, 2.2, 2.13, 3.4, 10.1, 10.1, 10.10
        which dropped the letters from the section headings.
        Here's my version:
        @sorted = map {$_->[0], ", "} sort {$a->[0] <=> $b->[0]} map {[$_, pack ("N*", $_)]} @unsorted;

        it works the same way, but I substituted in a Schwartzian transform to cache the original value of the heading instead of unpacking it again later. According to some tests with Benchmark, it's about 40% faster than doing the unpack later.
        Plus it keeps letters intact.
Re: Sorting on Section Numbers
by athomason (Curate) on Jul 28, 2000 at 03:24 UTC
    Fundamentally, you want to split on both dots and number-letter borders. Fortunately, split can be told to return its delimiters by including parentheses around the matching class. By doing that, you'll also need to get rid of the periods it matches with grep. So, this should split up arbitrarily nested section titles.
    my @a = qw/1 2 2.2 2.13 2.1.7 3.4a 10.4a.3b.8/; my @sortables = map { [grep /[^\.]/, split/([\.a-z])/, $_] } @a; print join "\n", map join(' ', @{$_}), @sortables;

    Your sort then just needs to have a function which splits up the entries and compares them.

    my @list = qw/1 10.4a.3b.8 2.2 2 2.13 3.4a 2.1.7 /; print join "\n", sort compare @list; sub compare { my @a = grep /[^\.]/, split/([\.a-z])/, $a; my @b = grep /[^\.]/, split/([\.a-z])/, $b; { my ($c, $d) = (shift @a, shift @b); (!defined $c and defined $d) and return -1; ( defined $c and !defined $d) and return 1; (!defined $c and !defined $d) and return 0; ($c > $d) and return 1; ($c < $d) and return -1; redo; } }

    You should avoid repeatedly splitting the lists by caching the split results.

    my @list = qw/1 10.4a.3b.8 2.2 2 2.13 3.4a 2.1.7 /; my @cache = map { [grep /[^\.]/, split/([\.a-z])/, $_] } @list; print join("\n", map("@$_", sort compare @cache)); sub compare { my @a = @$a; my @b = @$b; { my ($c, $d) = (shift @a, shift @b); (!defined $c and defined $d) and return -1; ( defined $c and !defined $d) and return 1; (!defined $c and !defined $d) and return 0; ($c > $d) and return 1; ($c < $d) and return -1; redo; } }
    There may be an error stuck somewhere in here; my test set was just what is in the above code. Hope this works for you.
Re: Sorting on Section Numbers
by acme (Novice) on Jul 28, 2000 at 18:48 UTC
    Of course, there is a module on CPAN which does what you want! It's called Sort::Versions, available from http://search.cpan.org/search?dist=SortVersions and it basically does what you want:
    use strict;
    use Sort::Versions;
    my @sections = qw(2.1.7 2.2 3.4a 2.13 1 2);
    @sections = sort versions @sections;
    print "@sections\n"; # prints "1 2 2.1.7 2.2 2.13 3.4a"
    Hope this helps! Leon

      Unfortunately, this module fails for some simple cases:

      use Sort::Versions; print join(" ",sort versions qw(1.2a 1.10a)),"\n";
      1.10a 1.2a

      Note that this matches the behavior described in the module's documentation, but I think it isn't what most people would want.

Re: Sorting on Section Numbers
by ar0n (Priest) on Jul 28, 2000 at 02:21 UTC
    You could (as you probably already know) write your own sort-routine:

    the following won't work, sorry
    print "*poof*\n";
    Evil code removed.

    update: i like tye's solution better (i.e. it actually works). use his!

    -- ar0n

Re: (turnstep) Sorting on Section Numbers
by turnstep (Parson) on Jul 28, 2000 at 22:05 UTC

    Here is my version. I *think* it should work for all cases. I'll break it down if anyone is curious.

    @sorted = map { $_->[0] } sort { $x=1; while (defined $a->[1][$x]) { defined $b->[1][$x] or return -1; if ($x%2) { ## Strict numeric comparison return 1 if $a->[1][$x] > $b->[1][$x]; return -1 if $a->[1][$x] < $b->[1][$x]; } else { ## Non-numeric comparison return 1 if $a->[1][$x] gt $b->[1][$x]; return -1 if $a->[1][$x] lt $b->[1][$x]; } $x++; } return defined $b->[1][$x] ? 1 : 0; } map { [$_, [split(/(\d+)/, $_)]] } @unsorted;
A reply falls below the community's threshold of quality. You may see it by logging in.