Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re: Benchmark, -s versus schwartzian

by ambrus (Abbot)
on Aug 22, 2004 at 17:47 UTC ( #384956=note: print w/replies, xml ) Need Help??

in reply to Benchmark, -s versus schwartzian

Update: Just before someone reads this, you might want to no that this does not answer the original question.

I've run the test, and got these results:

Benchmark: running Ordinary, Schwartzian for at least 2 CPU seconds... Ordinary: 2 wallclock secs ( 1.02 usr + 1.15 sys = 2.17 CPU) @ 55 +.30/s (n=120) Schwartzian: 2 wallclock secs ( 1.81 usr + 0.36 sys = 2.17 CPU) @ 8 +8.02/s (n=191)

So the Schwartzian transform is faster for me.

I have 170 files in the /bin directory.


I've moved the globbing out as merlyn has suggested.

I've also added a new sorting variant.

The code is:

#!perl use warnings; use strict; use Benchmark qw(timethese); our(@all, @sorted, @results); for my $dir ("/bin/", "/usr/bin/") { my $D; opendir $D, $dir; @all = readdir $D; closedir $D; chdir $dir; print "$dir contains ".@all." files\n"; sub sort_ord { @sorted = sort { -s $a <=> -s $b } @all; } sub sort_sch { @results = map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, -s $_], @all; } sub sort_new { my %h; @h{@all} = map {-s $_} @all; @results = sort { $h{$a}<=>$h{$b} } @all; } sub cmp_them { join("\n", @sorted) eq join("\n", @results) or die "bad sort"; } sort_ord; sort_sch; cmp_them; sort_new; cmp_them; timethese -5, { Ordinary => \&sort_ord, Schwartzian => \&sort_sch, Strange => \&sort_new, }; }

My results are:

/bin/ contains 172 files Benchmark: running Ordinary, Schwartzian, Strange for at least 5 CPU s +econds... Ordinary: 6 wallclock secs ( 2.18 usr + 3.08 sys = 5.26 CPU) @ 79 +.09/s (n=416) Schwartzian: 5 wallclock secs ( 4.69 usr + 0.58 sys = 5.27 CPU) @ 1 +34.72/s (n=710) Strange: 5 wallclock secs ( 4.60 usr + 0.64 sys = 5.24 CPU) @ 17 +8.44/s (n=935) /usr/bin/ contains 1397 files Benchmark: running Ordinary, Schwartzian, Strange for at least 5 CPU s +econds... Ordinary: 5 wallclock secs ( 1.91 usr + 3.15 sys = 5.06 CPU) @ 6 +.32/s (n=32) Schwartzian: 5 wallclock secs ( 4.73 usr + 0.52 sys = 5.25 CPU) @ 1 +2.95/s (n=68) Strange: 5 wallclock secs ( 4.67 usr + 0.55 sys = 5.22 CPU) @ 15 +.33/s (n=80)

I am surprised that Strange is a bit faster than Schwartzian, I thought it must be a little slower.

Btw, the two extra file in /bin are just . and .. that glob("/bin/*") do not return.

Replies are listed 'Best First'.
Re^2: Benchmark, -s versus schwartzian (vs hash)
by tye (Sage) on Aug 23, 2004 at 02:26 UTC

    I don't see why you'd be surprised that creating a single hash would be faster than creating a ton of tiny anonymous arrays. (:

    Yes, I realize you expected the hash lookups to be slower than the array lookups. But my first paragraph explains why the hash method will be faster in many cases. Then there is also the fact that a lexical hash avoids having to dereference a reference.

    I'm curious which comparison routines get optimized. Unfortunately, the details on that appear to be undocumented (either not mentioned at all or simply glossed over as "many"). I guess I'll have to find and read source code when I'm at a better location for doing that.

    - tye        

      Now that I think of it again, I see the flaw in my argument.

      I thought that creating the hash and creating the arrays shouldn't matter in the overall time, as it is done much fewer times than the lookups. The flaw (which might be already apparent for you) is that the lookups are not done much more times than building the arrays, only log(n) times moer often if there are n elements.

      The calculation yields these times: you do the difficult computation (-s here) n times, plus:

      Allocating n small arrays, plus n*log(n) array lookups. This means about O(n*log(n)) time, supposing you have a fast malloc.
      Creating a hash of n elements, plus n*log(n) hash lookups. This means about O(n*log(n)*log(n)) time.

      What I'd like to know now is whether my variant of the Schwartzian transform is generally faster than the original one, or does it happen only in this case? It's only a 15% gain here (Update: even fewer with the larger data set), so it might not mean anything.

      Yet one more thing. I've just done some more benchmarking. I've added some other variants in creating the hash:

      sub sort_new { my %h; @h{@all} = map {-s $_} @all; @results = sort { $h{$a}<=>$h{$b} } @all; } sub sort_new2 { my %h = map {$_, -s $_} @all; @results = sort { $h{$a}<=>$h{$b} } @all; } sub sort_new3 { my %h; keys %h = @all; @h{@all} = map {-s $_} @all; @results = sort { $h{$a}<=>$h{$b} } @all; } sub sort_new4 { my %h; keys %h = @all; %h = map {$_, -s $_} @all; @results = sort { $h{$a}<=>$h{$b} } @all; }
      I am somewhat surprised on the results, I'd have thought that method 2 would be faster than method 1 but no:
      I'm curious which comparison routines get optimized.
      No block, $a <=> $b, $b <=> $a, $a cmp $b and $b cmp $a.
      I don't see why you'd be surprised that creating a single hash would be faster than creating a ton of tiny anonymous arrays.
      No lookup would even be faster. With a GRT you don't use block with sort, and hence, no array or hash lookup.
      use Benchmark "cmpthese"; our @files = glob "/bin/*"; our (@o, @s, @g); cmpthese -1 => { ordinary => '@o = sort {-s $a <=> -s $b} @files', st => '@s = map $_ -> [0], sort {$a -> [1] <=> $b -> [1]} map [$_ => -s], @files', grt => '@g = map substr ($_, 8), sort map sprintf ("%08d%s", -s, $_), @files', }; die unless "@o" eq "@s" && "@o" eq "@g"; __END__ Rate ordinary st grt ordinary 684/s -- -54% -63% st 1492/s 118% -- -20% grt 1864/s 172% 25% --

      B::Deparse has the option -x$LEVEL, which at level 7 reveals the internal representation of the code quite closely. Then there's the various other modules like B::Terse which will show you exactly what opcodes you get. That should be enough to examine whether any particular construct gets optimized; of course it won't provide a full overview of all the possible optimizations.

      Makeshifts last the longest.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://384956]
[stevieb]: nice! I just finished a GPS "take me home" device last week, and did a bunch of software updates to it yesterday. I also created a pseudo chip with an Arduino to simulate an IC, where it responds to register read/writes over the I2C bus...
[stevieb]: ...from an I2C master. It's ugly and there are many changes I'm going to make, but I had not done anything like it before. It's designed for my RPi:: automated test platform; a system that does CI on *all* my RPi modules.
[shmem]: pseudo chip?
[stevieb]: well, what happens is the Arduino 'listens' for requests r/w, and does the appropriate thing when it's interrupted based on the 'register' address sent in. It's ugly as it was my first attempt, but I've got great new ideas I'm just sitting.
[stevieb]: ...down to implement now. Here's the sketch as it currently sits
[shmem]: well I use I2C and SPI and stuff, but creating a pseudo chip looks to me like lot of indirection and memory clutter... not?
[choroba]: Are you going to use the device soon? Related to your comment about "not having much time to do a lot of coding"...
[stevieb]: sure, but I'm just learning ;) I consider it practice to get a good understanding of what goes on *after* an I2C/SPI request is made
[shmem]: ah ok. Gonna read that. but now....
shmem compiles himself into his template

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2017-06-25 22:42 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (572 votes). Check out past polls.