Zoogie has asked for the wisdom of the Perl Monks concerning the following question:
Hi, I'm trying to write a sort function that sorts section numbers of a large document into the proper order. I keep having the feeling that's there's really easy way to do this in Perl that I'm missing :)
The section numbers look like "1, 2, 2.2, 2.13, 2.1.7, 3.4a, ...", etc. Neither a regular numeric sort nor a textual sort work, because "2.13" must come after "2.2", "10" must come after "2", and so forth.
I'm having a serious mental block on this one. Any ideas?
Re: Sorting on Section Numbers
by tye (Sage) on Jul 28, 2000 at 02:42 UTC
|
my @sects= qw( 1 2 2.2 2.13 2.1.7 3.4a 10.1
10.10 10.1a 1a.2 );
my $maxdigs= 4;
my %sects;
foreach my $sect ( @sects ) {
( my $sort= $sect ) =~ s/(\d+)/
sprintf "%0$maxdigs.$maxdigs"."d", $1 /ge;
$sects{$sort}= $sect;
}
print join( " ", @sects{ sort keys %sects } ), "\n";
| [reply] [Watch: Dir/Any] [d/l] |
|
Good approach! But you could simplify it a lot by noticing
that your sprintf only prepends zeros:
my @sects= qw( 1 2 2.2 2.13 2.1.7 3.4a 10.1 10.10 10.1a 1a.2 );
my %sects;
for ( @sects ) {
( my $key = $_ ) =~ s/(\d+)/substr("0000$1",-4)/ge;
$sects{$key} = $_;
}
print "@sects{sort keys %sects}\n";
Parameterizing on $maxdigs is left an an excercise for the acolyte. :-)
(And extra credit if you knew that you could interpolate a hash slice.)
If we simplify the problem space by eliminating the alphanumerics,
things get even neater--we don't even need a hash any more:
my @sorted = map { join '.', unpack 'N*', $_ }
sort
map { pack 'N*', split /\./ }
@unsorted;
-- Chip Salzenberg, Free-Floating Agent of Chaos
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
substr("0000$1",-4)
sprintf"%04.4d",$1
substr("0"x$maxdigs.$1,-$maxdigs)
sprintf"%0$maxdigs.$maxdigs"."d",$1
When I saw the name chip, I wondered if it was you. Welcome to Perl Monks! I've run into your work many times and have been impressed.
If you know you don't need extra leading zeros, then you can also get away with:
grep{s/(^|\D)0+(\d)/$1$2/g,1} sort
grep{s/(\d+)/sprintf"%06.6d",$1/ge,1} @sects;
Man, I shouldn't attempt this much thinking before breakfast. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
There is a slight bug in this solution, it drops letters.
Using the above test data, the following list is generated:
1, 1.2, 2, 2.1.7, 2.2, 2.13, 3.4, 10.1, 10.1, 10.10
which dropped the letters from the section headings.
Here's my version:
@sorted =
map {$_->[0], ", "}
sort {$a->[0] <=> $b->[0]}
map {[$_, pack ("N*", $_)]}
@unsorted;
it works the same way, but I substituted in a Schwartzian transform to cache the original value of the heading instead of unpacking it again later. According to some tests with Benchmark, it's about 40% faster than doing the unpack later.
Plus it keeps letters intact. | [reply] [Watch: Dir/Any] [d/l] |
|
|
|
Re: Sorting on Section Numbers
by athomason (Curate) on Jul 28, 2000 at 03:24 UTC
|
Fundamentally, you want to split on both dots and number-letter borders. Fortunately, split can be told to return its delimiters by including parentheses around the matching class. By doing that, you'll also need to get rid of the periods it matches with grep. So, this should split up arbitrarily nested section titles. my @a = qw/1 2 2.2 2.13 2.1.7 3.4a 10.4a.3b.8/;
my @sortables = map { [grep /[^\.]/, split/([\.a-z])/, $_] } @a;
print join "\n", map join(' ', @{$_}), @sortables;
Your sort then just needs to have a function which splits up the entries and compares them. my @list = qw/1 10.4a.3b.8 2.2 2 2.13 3.4a 2.1.7 /;
print join "\n", sort compare @list;
sub compare {
my @a = grep /[^\.]/, split/([\.a-z])/, $a;
my @b = grep /[^\.]/, split/([\.a-z])/, $b;
{
my ($c, $d) = (shift @a, shift @b);
(!defined $c and defined $d) and return -1;
( defined $c and !defined $d) and return 1;
(!defined $c and !defined $d) and return 0;
($c > $d) and return 1;
($c < $d) and return -1;
redo;
}
}
You should avoid repeatedly splitting the lists by caching the split results. my @list = qw/1 10.4a.3b.8 2.2 2 2.13 3.4a 2.1.7 /;
my @cache = map { [grep /[^\.]/, split/([\.a-z])/, $_] } @list;
print join("\n", map("@$_", sort compare @cache));
sub compare {
my @a = @$a;
my @b = @$b;
{
my ($c, $d) = (shift @a, shift @b);
(!defined $c and defined $d) and return -1;
( defined $c and !defined $d) and return 1;
(!defined $c and !defined $d) and return 0;
($c > $d) and return 1;
($c < $d) and return -1;
redo;
}
}
There may be an error stuck somewhere in here; my test set was just what is in the above code. Hope this works for you. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Sorting on Section Numbers
by acme (Novice) on Jul 28, 2000 at 18:48 UTC
|
Of course, there is a module on CPAN which does what you want! It's called Sort::Versions, available from http://search.cpan.org/search?dist=SortVersions and it basically does what you want:
#!/usr/local/bin/perl
use strict;
use Sort::Versions;
my @sections = qw(2.1.7 2.2 3.4a 2.13 1 2);
@sections = sort versions @sections;
print "@sections\n"; # prints "1 2 2.1.7 2.2 2.13 3.4a"
Hope this helps! Leon | [reply] [Watch: Dir/Any] |
|
use Sort::Versions;
print join(" ",sort versions qw(1.2a 1.10a)),"\n";
produces
1.10a 1.2a
Note that this matches the behavior described in the
module's documentation, but I think it isn't what most
people would want. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Sorting on Section Numbers
by ar0n (Priest) on Jul 28, 2000 at 02:21 UTC
|
You could (as you probably already know) write your own sort-routine:
the following won't work, sorry
print "*poof*\n";
Evil code removed.
update: i like tye's solution better (i.e. it actually works). use his!
-- ar0n
| [reply] [Watch: Dir/Any] [d/l] |
Re: (turnstep) Sorting on Section Numbers
by turnstep (Parson) on Jul 28, 2000 at 22:05 UTC
|
Here is my version. I *think* it should work for
all cases. I'll break it down if anyone is curious.
@sorted =
map { $_->[0] }
sort {
$x=1;
while (defined $a->[1][$x]) {
defined $b->[1][$x] or return -1;
if ($x%2) { ## Strict numeric comparison
return 1 if $a->[1][$x] > $b->[1][$x];
return -1 if $a->[1][$x] < $b->[1][$x];
}
else { ## Non-numeric comparison
return 1 if $a->[1][$x] gt $b->[1][$x];
return -1 if $a->[1][$x] lt $b->[1][$x];
}
$x++;
}
return defined $b->[1][$x] ? 1 : 0;
}
map { [$_, [split(/(\d+)/, $_)]] }
@unsorted;
| [reply] [Watch: Dir/Any] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|