Now that I think of it again, I see the flaw in my argument.
I thought that creating the hash and creating the arrays
shouldn't matter in the overall time, as it is done much
fewer times than the lookups. The flaw (which might
be already apparent for you) is that the lookups are not
done much more times than building the arrays, only log(n)
times moer often if there are n elements.
The calculation yields these times:
you do the difficult computation (s here) n times, plus:
 Schwartzian:

Allocating n small arrays, plus n*log(n) array lookups.
This means about O(n*log(n)) time, supposing you have a fast malloc.
 Strange:

Creating a hash of n elements, plus n*log(n) hash lookups.
This means about O(n*log(n)*log(n)) time.
What I'd like to know now is whether my variant of the Schwartzian
transform is generally faster than the original one,
or does it happen only in this case? It's only
a 15% gain here (Update: even fewer with the larger data set),
so it might not mean anything.
Yet one more thing. I've just done some more benchmarking.
I've added some other variants in creating the hash:
sub sort_new {
my %h;
@h{@all} = map {s $_} @all;
@results = sort { $h{$a}<=>$h{$b} } @all;
}
sub sort_new2 {
my %h = map {$_, s $_} @all;
@results = sort { $h{$a}<=>$h{$b} } @all;
}
sub sort_new3 {
my %h;
keys %h = @all;
@h{@all} = map {s $_} @all;
@results = sort { $h{$a}<=>$h{$b} } @all;
}
sub sort_new4 {
my %h;
keys %h = @all;
%h = map {$_, s $_} @all;
@results = sort { $h{$a}<=>$h{$b} } @all;
}
I am somewhat surprised on the results, I'd have thought that
method 2 would be faster than method 1 but no:
 [reply] [d/l] [select] 
 [reply] 
The node you've mentioned
actually uses an even faster form of the Schwartzian transform
I've completely forgot about.
Thus, I've added it to my benchmark too:
sub sort_arr {
my @h = map s, @all;
@results = @all[sort {$h[$a]<=>$h[$b]} 0..@h1];
}
The results are amazing: this form is about 1.5 times as efficent
than any of the other ones.
 [reply] [d/l] 
I'm curious which comparison routines get optimized.
No block, $a <=> $b, $b <=> $a,
$a cmp $b and $b cmp $a.
I don't see why you'd be surprised that creating a single hash would be faster than creating a ton of tiny anonymous arrays.
No lookup would even be faster. With a GRT you don't use block with sort, and hence, no array or hash lookup.
use Benchmark "cmpthese";
our @files = glob "/bin/*";
our (@o, @s, @g);
cmpthese 1 => {
ordinary => '@o = sort {s $a <=> s $b} @files',
st => '@s = map $_ > [0],
sort {$a > [1] <=> $b > [1]}
map [$_ => s], @files',
grt => '@g = map substr ($_, 8),
sort
map sprintf ("%08d%s", s, $_), @files',
};
die unless "@o" eq "@s" && "@o" eq "@g";
__END__
Rate ordinary st grt
ordinary 684/s  54% 63%
st 1492/s 118%  20%
grt 1864/s 172% 25% 
 [reply] [d/l] [select] 
B::Deparse has the option x$LEVEL, which at level 7 reveals the internal representation of the code quite closely. Then there's the various other modules like B::Terse which will show you exactly what opcodes you get. That should be enough to examine whether any particular construct gets optimized; of course it won't provide a full overview of all the possible optimizations.
Makeshifts last the longest.
 [reply] 