http://www.perlmonks.org?node_id=907074

punkish has asked for the wisdom of the Perl Monks concerning the following question:

In my quest to understand speed differences between different techniques for a simple task, wrote the following, very contrived, script.

use Benchmark qw(:all) ; use PDL; my @runs = ( [1, 10_000_000], [10_000, 1000], [1000, 10_000], [10_000_000, 1] ); for my $run (@runs) { my $count = $run->[0]; my $loops = $run->[1]; print "Test with count: $count, loops: $loops\n" . "-" x 50 . "\n" +; timethese($count, { 'for' => sub { my @out = (); for (1 .. $loops) { push @out, $_ + * $_;} }, 'map' => sub { my @out = map { $_ * $_ } (1 .. $loops); }, 'pdl' => sub { my $in = pdl(1 .. $loops); my $out = $in * $in; + } }); }

The script tries to compare the three techniques on a spectrum of memory intensive to cpu intensive tasks. The results are below

Test with count: 1, loops: 10000000 -------------------------------------------------- Benchmark: timing 1 iterations of for, map, pdl... for: 2 wallclock secs ( 2.12 usr + 0.29 sys = 2.41 CPU) @ 0 +.41/s (n=1) (warning: too few iterations for a reliable count) map: 5 wallclock secs ( 3.86 usr + 1.15 sys = 5.01 CPU) @ 0 +.20/s (n=1) (warning: too few iterations for a reliable count) pdl: 17 wallclock secs ( 2.83 usr + 0.70 sys = 3.53 CPU) @ 0 +.28/s (n=1) (warning: too few iterations for a reliable count) Test with count: 10000, loops: 1000 -------------------------------------------------- Benchmark: timing 10000 iterations of for, map, pdl... for: 2 wallclock secs ( 1.86 usr + 0.00 sys = 1.86 CPU) @ 53 +76.34/s (n=10000) map: 3 wallclock secs ( 3.37 usr + 0.00 sys = 3.37 CPU) @ 29 +67.36/s (n=10000) pdl: 3 wallclock secs ( 2.15 usr + 0.01 sys = 2.16 CPU) @ 46 +29.63/s (n=10000) Test with count: 1000, loops: 10000 -------------------------------------------------- Benchmark: timing 1000 iterations of for, map, pdl... for: 1 wallclock secs ( 1.84 usr + 0.00 sys = 1.84 CPU) @ 54 +3.48/s (n=1000) map: 4 wallclock secs ( 3.39 usr + 0.00 sys = 3.39 CPU) @ 29 +4.99/s (n=1000) pdl: 2 wallclock secs ( 1.98 usr + 0.01 sys = 1.99 CPU) @ 50 +2.51/s (n=1000) Test with count: 10000000, loops: 1 -------------------------------------------------- Benchmark: timing 10000000 iterations of for, map, pdl... for: 10 wallclock secs ( 8.82 usr + 0.01 sys = 8.83 CPU) @ 11 +32502.83/s (n=10000000) map: 7 wallclock secs ( 6.40 usr + -0.01 sys = 6.39 CPU) @ 15 +64945.23/s (n=10000000) pdl: 201 wallclock secs (198.35 usr + 0.28 sys = 198.63 CPU) @ + 50344.86/s (n=10000000)

Some aspects of the above are not surprising. There is a considerable overhead, I am guessing, in setting up a piddle, so setting up lots of small piddles many times is expensive. But, I was expecting, perhaps wrongly, that map would always be cheaper than for. Dunno why I believed that looping is always expensive, although, I wonder if under the hood, map is looping really not just effectively.

Please explain, gently, if possible. Thanks in advance.

Bonus question -- what exactly is the difference between the usr and the sys seconds in the Benchmark reported timings?

Note: In the PDL test I am not really ending up with an out array. Instead, I am getting a piddle. If I convert $out to @out using PDL's list, it just kills PDL.



when small people start casting long shadows, it is time to go to bed