There's more than one way to do things PerlMonks

### Benchmark, -s versus schwartzian

by Darby (Sexton)
 on Aug 22, 2004 at 17:33 UTC Need Help??
Darby has asked for the wisdom of the Perl Monks concerning the following question:

I just got the Perl CD Bookshelf version 4,and I am going through the exercises for the "learning Perl Objects, References, and Modules book. Chapter 7 ex 2 has you use Benchmark to compare the Schwartzian transformation versus a straight sort using -s. According to the answer, using the Schwartzian transformation should be faster, but I'm getting the opposite results. The Relevant code is:
```use Benchmark qw(timethese);

timethese( -2, {
Ordinary => q{
my @results = sort { -s \$a <=> -s \$b } glob "/bin/*";
},
Schwartzian => q{
my @sorted =
map \$_->[0],
sort { \$a->[1] <=> \$b->[1] }
map [\$_, -s \$_],
glob "/bin/*";
},
});
[download]```
My results are:
```  Ordinary:  2 wallclock secs ( 0.61 usr +  1.64 sys =  2.25 CPU) @ 23
+7.78/s (n=535)
Schwartzian:  2 wallclock secs ( 1.52 usr +  0.76 sys =  2.28 CPU) @ 2
+02.19/s (n=461)
[download]```
If I do a 5000 iterations instead of 2 seconds, I get:
```Benchmark: timing 5000 iterations of Ordinary, Schwartzian...
Ordinary: 21 wallclock secs ( 6.46 usr + 14.78 sys = 21.24 CPU) @ 23
+5.40/s (n=5000)
Schwartzian: 26 wallclock secs (17.15 usr +  8.45 sys = 25.60 CPU) @ 1
+95.31/s (n=5000)
[download]```
The only reason I can think of is that I have a hyperthreading processor. Would that make the difference?
I'm running SuSE 9.1 with the perl version that came with it.
uname -a yields:
```Linux dell 2.6.5-7.104-smp #1 SMP Wed Jul 28 16:42:13 UTC 2004 i686 i6
+86 i386 GNU/Linux
[download]```
Perl -V yields:
```
Summary of my perl5 (revision 5.0 version 8 subversion 3) configuratio
+n:
Platform:
osname=linux, osvers=2.6.4, archname=i586-linux-thread-multi
uname='linux d209 2.6.4 #1 smp thu mar 11 17:56:49 utc 2004 i686 i
+686 i386 gnulinux '
config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinp
+erl
-Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Dopti
+mize=-O2 -march=i586
-mcpu=i686 -fmessage-length=0 -Wall -Wall -pipe'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemulti
+plicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
+-fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -march=i586 -mcpu=i686 -fmessage-length=0 -Wall -Wal
+l -pipe',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-stri
+ct-aliasing'
ccversion='', gccversion='3.3.3 (SuSE Linux)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1
+2
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
+ lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =''
libpth=/lib /usr/lib /usr/local/lib
libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.3.3'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynami
+c -Wl,-rpath,/usr/lib/perl5/5.8.3/i586-linux-thread-multi/CORE'
cccdlflags='-fPIC', lddlflags='-shared'

Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL
+_IMPLICIT_CONTEXT
Built under linux
Compiled at Apr  3 2004 00:52:08
@INC:
/usr/lib/perl5/5.8.3/i586-linux-thread-multi
/usr/lib/perl5/5.8.3
/usr/lib/perl5/site_perl/5.8.3/i586-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.3/i586-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.3
/usr/lib/perl5/vendor_perl
[download]```
Thanks a lot.

Replies are listed 'Best First'.
•Re: Benchmark, -s versus schwartzian
by merlyn (Sage) on Aug 22, 2004 at 17:45 UTC
Aha! Moving the glob out of the loops (using Ambrus's code) gave the expected results:
```/etc/ contains 311 files
Benchmark: running Ordinary, Schwartzian, Strange for at least 5 CPU s
+econds...
Ordinary:  5 wallclock secs ( 1.47 usr +  4.09 sys =  5.56 CPU) @ 99
+.28/s (n=552)
Schwartzian:  5 wallclock secs ( 3.95 usr +  1.13 sys =  5.08 CPU) @ 3
+33.46/s (n=1694)
Strange:  5 wallclock secs ( 4.01 usr +  1.12 sys =  5.13 CPU) @ 35
+9.06/s (n=1842)
[download]```

```/bin/ contains 111 files
Benchmark: running Ordinary, Schwartzian, Strange for at least 5 CPU s
+econds...
Ordinary:  5 wallclock secs ( 1.30 usr +  3.98 sys =  5.28 CPU) @ 34
+8.86/s (n=1842)
Schwartzian:  6 wallclock secs ( 3.97 usr +  1.45 sys =  5.42 CPU) @ 9
+79.89/s (n=5311)
Strange:  5 wallclock secs ( 3.84 usr +  1.46 sys =  5.30 CPU) @ 11
+45.85/s (n=6073)
[download]```
One thing, If I used "/usr/bin" or any other path with a '/' in the middle it went haywire. Thanks a lot for all the help everybody!

This still does not explain the original results. I think that even with the glob in the loop, the Schwartzian transform should be faster. (I couldn't reproduce those results, the Schwartzian was faster for me even with your original code.)

One thing, If I used "/usr/bin" or any other path with a '/' in the middle it went haywire.

That's wierd.

Something to change in the next revision of the book? :-)

Makeshifts last the longest.

Well, I meant what I said in the answer. On my laptop, doing the glob inside the loop on 33 elements made the ST the winner.

But yes, suggesting that the filenames be captured once might make sense for the 2ed of the book.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Re: Benchmark, -s versus schwartzian
by ambrus (Abbot) on Aug 22, 2004 at 17:47 UTC

Update: Just before someone reads this, you might want to no that this does not answer the original question.

I've run the test, and got these results:

```Benchmark: running Ordinary, Schwartzian for at least 2 CPU seconds...
Ordinary:  2 wallclock secs ( 1.02 usr +  1.15 sys =  2.17 CPU) @ 55
+.30/s (n=120)
Schwartzian:  2 wallclock secs ( 1.81 usr +  0.36 sys =  2.17 CPU) @ 8
+8.02/s (n=191)
[download]```

So the Schwartzian transform is faster for me.

I have 170 files in the /bin directory.

Update:

I've moved the globbing out as merlyn has suggested.

I've also added a new sorting variant.

The code is:

```#!perl

use warnings;
use strict;
use Benchmark qw(timethese);

our(@all, @sorted, @results);

for my \$dir ("/bin/", "/usr/bin/") {

my \$D;
opendir \$D, \$dir;
@all = readdir \$D;
closedir \$D;
chdir \$dir;
print "\$dir contains ".@all." files\n";

sub sort_ord {
@sorted = sort { -s \$a <=> -s \$b } @all;
}
sub sort_sch {
@results =
map \$_->[0],
sort { \$a->[1] <=> \$b->[1] }
map [\$_, -s \$_],
@all;
}
sub sort_new {
my %h;
@h{@all} = map {-s \$_} @all;
@results = sort { \$h{\$a}<=>\$h{\$b} } @all;
}
sub cmp_them {
join("\n", @sorted) eq join("\n", @results) or
die "bad sort";
}

sort_ord;
sort_sch; cmp_them;
sort_new; cmp_them;

timethese -5, {
Ordinary => \&sort_ord,
Schwartzian => \&sort_sch,
Strange => \&sort_new,
};

}
[download]```

My results are:

```/bin/ contains 172 files
Benchmark: running Ordinary, Schwartzian, Strange for at least 5 CPU s
+econds...
Ordinary:  6 wallclock secs ( 2.18 usr +  3.08 sys =  5.26 CPU) @ 79
+.09/s (n=416)
Schwartzian:  5 wallclock secs ( 4.69 usr +  0.58 sys =  5.27 CPU) @ 1
+34.72/s (n=710)
Strange:  5 wallclock secs ( 4.60 usr +  0.64 sys =  5.24 CPU) @ 17
+8.44/s (n=935)
/usr/bin/ contains 1397 files
Benchmark: running Ordinary, Schwartzian, Strange for at least 5 CPU s
+econds...
Ordinary:  5 wallclock secs ( 1.91 usr +  3.15 sys =  5.06 CPU) @  6
+.32/s (n=32)
Schwartzian:  5 wallclock secs ( 4.73 usr +  0.52 sys =  5.25 CPU) @ 1
+2.95/s (n=68)
Strange:  5 wallclock secs ( 4.67 usr +  0.55 sys =  5.22 CPU) @ 15
+.33/s (n=80)
[download]```

I am surprised that Strange is a bit faster than Schwartzian, I thought it must be a little slower.

Btw, the two extra file in /bin are just . and .. that glob("/bin/*") do not return.

I don't see why you'd be surprised that creating a single hash would be faster than creating a ton of tiny anonymous arrays. (:

Yes, I realize you expected the hash lookups to be slower than the array lookups. But my first paragraph explains why the hash method will be faster in many cases. Then there is also the fact that a lexical hash avoids having to dereference a reference.

I'm curious which comparison routines get optimized. Unfortunately, the details on that appear to be undocumented (either not mentioned at all or simply glossed over as "many"). I guess I'll have to find and read source code when I'm at a better location for doing that.

- tye

Now that I think of it again, I see the flaw in my argument.

I thought that creating the hash and creating the arrays shouldn't matter in the overall time, as it is done much fewer times than the lookups. The flaw (which might be already apparent for you) is that the lookups are not done much more times than building the arrays, only log(n) times moer often if there are n elements.

The calculation yields these times: you do the difficult computation (-s here) n times, plus:

Schwartzian:
Allocating n small arrays, plus n*log(n) array lookups. This means about O(n*log(n)) time, supposing you have a fast malloc.
Strange:
Creating a hash of n elements, plus n*log(n) hash lookups. This means about O(n*log(n)*log(n)) time.

What I'd like to know now is whether my variant of the Schwartzian transform is generally faster than the original one, or does it happen only in this case? It's only a 15% gain here (Update: even fewer with the larger data set), so it might not mean anything.

Yet one more thing. I've just done some more benchmarking. I've added some other variants in creating the hash:

```sub sort_new {
my %h;
@h{@all} = map {-s \$_} @all;
@results = sort { \$h{\$a}<=>\$h{\$b} } @all;
}
sub sort_new2 {
my %h = map {\$_, -s \$_} @all;
@results = sort { \$h{\$a}<=>\$h{\$b} } @all;
}
sub sort_new3 {
my %h;
keys %h = @all;
@h{@all} = map {-s \$_} @all;
@results = sort { \$h{\$a}<=>\$h{\$b} } @all;
}
sub sort_new4 {
my %h;
keys %h = @all;
%h = map {\$_, -s \$_} @all;
@results = sort { \$h{\$a}<=>\$h{\$b} } @all;
}
[download]```
I am somewhat surprised on the results, I'd have thought that method 2 would be faster than method 1 but no:
I'm curious which comparison routines get optimized.
No block, \$a <=> \$b, \$b <=> \$a, \$a cmp \$b and \$b cmp \$a.
I don't see why you'd be surprised that creating a single hash would be faster than creating a ton of tiny anonymous arrays.
No lookup would even be faster. With a GRT you don't use block with sort, and hence, no array or hash lookup.
```use Benchmark "cmpthese";

our @files = glob "/bin/*";

our (@o, @s, @g);

cmpthese -1 => {
ordinary   => '@o = sort {-s \$a <=> -s \$b} @files',
st         => '@s = map   \$_ -> [0],
sort {\$a -> [1] <=> \$b -> [1]}
map  [\$_ => -s], @files',
grt        => '@g = map  substr (\$_, 8),
sort
map  sprintf ("%08d%s", -s, \$_), @files',
};

die unless "@o" eq "@s" && "@o" eq "@g";
__END__

Rate ordinary       st      grt
ordinary  684/s       --     -54%     -63%
st       1492/s     118%       --     -20%
grt      1864/s     172%      25%       --
[download]```

B::Deparse has the option -x\$LEVEL, which at level 7 reveals the internal representation of the code quite closely. Then there's the various other modules like B::Terse which will show you exactly what opcodes you get. That should be enough to examine whether any particular construct gets optimized; of course it won't provide a full overview of all the possible optimizations.

Makeshifts last the longest.

Re: Benchmark, -s versus schwartzian
by CountZero (Bishop) on Aug 22, 2004 at 21:21 UTC
On my Windows XP Perl 5.8 system the Schwartzian runs about 5 times as fast as the sort (with the glob inside the loop).

```Benchmark: running Ordinary, Schwartzian for at least 2 CPU seconds...
Ordinary:  2 wallclock secs ( 0.42 usr +  1.75 sys =  2.17 CPU) @  5
+.06/s (n=11)
Schwartzian:  2 wallclock secs ( 0.68 usr +  1.45 sys =  2.13 CPU) @ 2
+3.91/s (n=51)
[download]```

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Benchmark, -s versus schwartzian
by ambrus (Abbot) on Aug 22, 2004 at 20:09 UTC
You may want to put readmore tags around the output of perl -V, Thanks, will do.
Re: Benchmark, -s versus schwartzian
by ccn (Vicar) on Aug 22, 2004 at 20:20 UTC

May be it can depends on properties of the (glob "/bin/*") list. If on your system it comes almost sorted then schwartzian transform can be slower due to it's overhead.

Re: Benchmark, -s versus schwartzian
by Jenda (Abbot) on Aug 23, 2004 at 14:15 UTC

Schwartzian transformation is quicker only if the sort criteria is complex enough. ST makes sure the computation is only done N times instead of O(N*log(N)) times at the expense of creating a temporary array of two-item arrays containing the original values and the computed results. If the overhead of this AoA is bigger than the gain obtained by decreasing the number of "complex computations" ST is slower than the ordinary sort.

I think the -s is not a well chosen example for ST. Whether you actually gain or loose by using ST depends on too many things, the disk speed and cache size, the OS, the number of files in the directory, the speed of your CPU, the amount and speed of your memory, ...

IMHO of course, Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
-- Rick Osborne

Yes, the -s example is a simple example, but doesn't necessarily highlight the ST. The point of the section is to talk about the expense of the comparison function, and one possible solution. I chose -s, because it's easy to talk about, and the ST, because it's the most straightforward general solution. Certainly, in specialized cases, a simple array cache or hash cache or GRT might be better.

In a tutorial, I'm constantly weighing how much or how little to say to accomplish some level of understanding for the tasks at hand. It's trickier than it looks, as anyone who has written training materials can confirm.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

I'd say that in the general case, GRT will be the fastest solution. If the list is large enough, the bottleneck will be in the sort function, and since the GRT has the simplest sort function possible (namely, the default one), it will outperform anything else.

Log In?
 Username: Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://384954]
Approved by benn
Front-paged by ccn
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2018-05-27 05:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
World peace can best be achieved by:

Results (194 votes). Check out past polls.

Notices?