Anyway, it turned out to be a matter of using ST
I don't think you need a Schwartzian Transform here. An ST makes sense if the
individual comparison operation is computationally expensive. This is
not the case with interpreting a string as a number, in particular as
the conversion is done only once for each string and then "cached" in
the NV/IV fields of the scalar variable(*). In other words, the
simple approach (not using ST) is even faster in this case:
#!/usr/bin/perl
use strict;
use warnings;
no warnings 'numeric';
use Benchmark 'cmpthese';
for my $e (2..5) {
my $n = 10**$e;
print "\nNumber of file names: $n\n";
my @data;
push @data, join(".", int(rand($n)), int(rand($n)), 'force.0.5.1LG
+Y.pdb') for 1..$n;
cmpthese( 10**(6-$e),
{
'simple' => sub {
my @unsorted = @data;
my @sorted = sort { $a <=> $b } @unsorted;
},
'ST' => sub {
my @unsorted = @data;
my @sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map { [ $_, int $_ ] } @unsorted;
},
}
);
}
__END__
Number of file names: 100
Rate ST simple
ST 3247/s -- -75%
simple 12987/s 300% --
Number of file names: 1000
Rate ST simple
ST 248/s -- -79%
simple 1176/s 375% --
Number of file names: 10000
Rate ST simple
ST 10.3/s -- -74%
simple 39.2/s 280% --
Number of file names: 100000
s/iter ST simple
ST 1.87 -- -50%
simple 0.943 99% --
Another beneficial side effect of the simple approach is that if you
happen to have two names like this
30.31.force.0.5.1LGY.pdb
30.32.force.0.5.1LGY.pdb
they would be ordered in some useful way, because the fractional
part of the number is automatically taken into consideration when just
treating the name as a number.
(*)
use Devel::Peek;
my $s = "30.31.force.0.5.1LGY.pdb";
Dump $s;
print 0+$s, "\n"; # treat as number
Dump $s;
__END__
SV = PV(0x605150) at 0x604fa0
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x6370d0 "30.31.force.0.5.1LGY.pdb"\0
CUR = 24
LEN = 32
30.31
SV = PVNV(0x607880) at 0x604fa0
REFCNT = 1
FLAGS = (PADBUSY,PADMY,NOK,POK,pIOK,pNOK,pPOK)
IV = 30 <---
NV = 30.31 <---
PV = 0x6370d0 "30.31.force.0.5.1LGY.pdb"\0
CUR = 24
LEN = 32
|