Problems? Is your data what you think it is? PerlMonks

### Sorting on Section Numbers

by Zoogie (Curate)
 on Jul 28, 2000 at 02:06 UTC Need Help??
Zoogie has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to write a sort function that sorts section numbers of a large document into the proper order. I keep having the feeling that's there's really easy way to do this in Perl that I'm missing :)

The section numbers look like "1, 2, 2.2, 2.13, 2.1.7, 3.4a, ...", etc. Neither a regular numeric sort nor a textual sort work, because "2.13" must come after "2.2", "10" must come after "2", and so forth.

I'm having a serious mental block on this one. Any ideas?

Replies are listed 'Best First'.
Re: Sorting on Section Numbers
by tye (Sage) on Jul 28, 2000 at 02:42 UTC

Pad all numeric parts before sorting:

```my @sects= qw( 1 2 2.2 2.13 2.1.7 3.4a 10.1
10.10 10.1a 1a.2 );
my \$maxdigs= 4;
my %sects;
foreach my \$sect (  @sects  ) {
( my \$sort= \$sect ) =~ s/(\d+)/
sprintf "%0\$maxdigs.\$maxdigs"."d", \$1 /ge;
\$sects{\$sort}= \$sect;
}
print join( " ", @sects{ sort keys %sects } ), "\n";
Good approach! But you could simplify it a lot by noticing that your sprintf only prepends zeros:
```my @sects= qw( 1 2 2.2 2.13 2.1.7 3.4a 10.1 10.10 10.1a 1a.2 );
my %sects;
for ( @sects ) {
( my \$key = \$_ ) =~ s/(\d+)/substr("0000\$1",-4)/ge;
\$sects{\$key} = \$_;
}
print "@sects{sort keys %sects}\n";

Parameterizing on \$maxdigs is left an an excercise for the acolyte. :-) (And extra credit if you knew that you could interpolate a hash slice.)

If we simplify the problem space by eliminating the alphanumerics, things get even neater--we don't even need a hash any more:

```my @sorted = map { join '.', unpack 'N*', \$_ }
sort
map { pack 'N*', split /\./ }
@unsorted;

-- Chip Salzenberg, Free-Floating Agent of Chaos

You simplified it a lot by dropping \$maxdigs. Put that back in and I doubt you'll have saved more than a keystroke or two. :) But thanks for the substr method; I like that.

```substr("0000\$1",-4)
sprintf"%04.4d",\$1
substr("0"x\$maxdigs.\$1,-\$maxdigs)
sprintf"%0\$maxdigs.\$maxdigs"."d",\$1

When I saw the name chip, I wondered if it was you. Welcome to Perl Monks! I've run into your work many times and have been impressed.

If you know you don't need extra leading zeros, then you can also get away with:

```grep{s/(^|\D)0+(\d)/\$1\$2/g,1} sort
grep{s/(\d+)/sprintf"%06.6d",\$1/ge,1} @sects;

Man, I shouldn't attempt this much thinking before breakfast.

There is a slight bug in this solution, it drops letters.
Using the above test data, the following list is generated:
1, 1.2, 2, 2.1.7, 2.2, 2.13, 3.4, 10.1, 10.1, 10.10
which dropped the letters from the section headings.
Here's my version:
```@sorted =
map {\$_->[0], ", "}
sort {\$a->[0] <=> \$b->[0]}
map {[\$_, pack ("N*", \$_)]}

@unsorted;

it works the same way, but I substituted in a Schwartzian transform to cache the original value of the heading instead of unpacking it again later. According to some tests with Benchmark, it's about 40% faster than doing the unpack later.
Plus it keeps letters intact.
Re: Sorting on Section Numbers
by athomason (Curate) on Jul 28, 2000 at 03:24 UTC
Fundamentally, you want to split on both dots and number-letter borders. Fortunately, split can be told to return its delimiters by including parentheses around the matching class. By doing that, you'll also need to get rid of the periods it matches with grep. So, this should split up arbitrarily nested section titles.
```my @a = qw/1 2 2.2 2.13 2.1.7 3.4a 10.4a.3b.8/;
my @sortables = map { [grep /[^\.]/, split/([\.a-z])/, \$_] } @a;
print join "\n", map join(' ', @{\$_}), @sortables;

Your sort then just needs to have a function which splits up the entries and compares them.

```my @list = qw/1 10.4a.3b.8 2.2 2 2.13 3.4a 2.1.7 /;

print join "\n", sort compare @list;

sub compare {
my @a = grep /[^\.]/, split/([\.a-z])/, \$a;
my @b = grep /[^\.]/, split/([\.a-z])/, \$b;
{
my (\$c, \$d) = (shift @a, shift @b);
(!defined \$c and  defined \$d) and return -1;
( defined \$c and !defined \$d) and return 1;
(!defined \$c and !defined \$d) and return 0;
(\$c > \$d) and return 1;
(\$c < \$d) and return -1;
redo;
}
}

You should avoid repeatedly splitting the lists by caching the split results.

```my @list = qw/1 10.4a.3b.8 2.2 2 2.13 3.4a 2.1.7 /;

my @cache = map { [grep /[^\.]/, split/([\.a-z])/, \$_] } @list;
print join("\n", map("@\$_", sort compare @cache));

sub compare {
my @a = @\$a;
my @b = @\$b;
{
my (\$c, \$d) = (shift @a, shift @b);
(!defined \$c and  defined \$d) and return -1;
( defined \$c and !defined \$d) and return 1;
(!defined \$c and !defined \$d) and return 0;
(\$c > \$d) and return 1;
(\$c < \$d) and return -1;
redo;
}
}
There may be an error stuck somewhere in here; my test set was just what is in the above code. Hope this works for you.
Re: Sorting on Section Numbers
by acme (Novice) on Jul 28, 2000 at 18:48 UTC
Of course, there is a module on CPAN which does what you want! It's called Sort::Versions, available from http://search.cpan.org/search?dist=SortVersions and it basically does what you want:
```#!/usr/local/bin/perl

use strict;
use Sort::Versions;

my @sections = qw(2.1.7 2.2 3.4a 2.13 1 2);
@sections = sort versions @sections;
print "@sections\n"; # prints "1 2 2.1.7 2.2 2.13 3.4a"
```
Hope this helps! Leon

Unfortunately, this module fails for some simple cases:

```use Sort::Versions;
print join(" ",sort versions qw(1.2a 1.10a)),"\n";
produces
```1.10a 1.2a

Note that this matches the behavior described in the module's documentation, but I think it isn't what most people would want.

Re: Sorting on Section Numbers
by ar0n (Priest) on Jul 28, 2000 at 02:21 UTC

the following won't work, sorry
```print "*poof*\n";
Evil code removed.

update: i like tye's solution better (i.e. it actually works). use his!

-- ar0n

Re: (turnstep) Sorting on Section Numbers
by turnstep (Parson) on Jul 28, 2000 at 22:05 UTC

Here is my version. I *think* it should work for all cases. I'll break it down if anyone is curious.

```@sorted =
map { \$_->[0] }
sort {
\$x=1;
while (defined \$a->[1][\$x]) {
defined \$b->[1][\$x] or return -1;
if (\$x%2) { ## Strict numeric comparison
return 1  if \$a->[1][\$x] > \$b->[1][\$x];
return -1 if \$a->[1][\$x] < \$b->[1][\$x];
}
else { ## Non-numeric comparison
return 1  if \$a->[1][\$x] gt \$b->[1][\$x];
return -1 if \$a->[1][\$x] lt \$b->[1][\$x];
}
\$x++;
}
return defined \$b->[1][\$x] ? 1 : 0;
}
map { [\$_, [split(/(\d+)/, \$_)]] }
@unsorted;
Count backwards from Z...
by Anonymous Monk on Jul 28, 2000 at 11:26 UTC
Damn, so I don't know how to use this thing... (managed to post anon and mess up the present) Here it is (hopefully the corect way...)
```--------------------------------------------------------------
#!/usr/bin/perl

@list = ("1", "3.4b.4", "2", "15", "123", "2.2", "2.13", "2.1.7", "311
+2", "3.4a", "1.5.32.1", "3.4.1", "3.1.5", "3.4b.1");

foreach \$section (@list) {
my @subSections = split (/\./, \$section);
for (my \$x=0; \$x <= \$#subSections; \$x++) {
if (\$maxdigit[\$x] <  length(\$subSections[\$x])) { \$maxd
+igit[\$x] = length(\$subSections[\$x]); }
}
}

foreach \$section (@list) {
my @subSections = split (/\./, \$section);
for (my \$x=0; \$x <= \$#subSections; \$x++) {

for(my \$y = 0; \$y < \$maxdigit[\$x]-length(\$subSections[
+\$x]); \$y++) { \$padding .= "0"; }
}
}

foreach \$section (@sortedList) {
my @subSections = split (/\./, \$section);

for (my \$x=0; \$x <= \$#subSections; \$x++) {
\$subSections[\$x] =~ s/^0*//;
}
}

print "\$li\n";
}
--------------------------------------------------------------

Create A New User
Node Status?
node history
Node Type: perlquestion [id://24765]
Approved by root
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2018-01-20 05:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
How did you see in the new year?

Results (226 votes). Check out past polls.

Notices?