for v. map v. pdl

punkish has asked for the wisdom of the Perl Monks concerning the following question:

In my quest to understand speed differences between different techniques for a simple task, wrote the following, very contrived, script.

use Benchmark qw(:all) ;
use PDL;

my @runs = (
    [1, 10_000_000],
    [10_000, 1000],
    [1000, 10_000],
    [10_000_000, 1]
);

for my $run (@runs) {
    my $count = $run->[0]; my $loops = $run->[1];
    print "Test with count: $count, loops: $loops\n" . "-" x 50 . "\n"
+;
    timethese($count, {
        'for' => sub { my @out = (); for (1 .. $loops) { push @out, $_
+ * $_;} },
        'map' => sub { my @out = map { $_ * $_ } (1 .. $loops); },
        'pdl' => sub { my $in = pdl(1 .. $loops); my $out = $in * $in;
+ }
    });
}
[download]

The script tries to compare the three techniques on a spectrum of memory intensive to cpu intensive tasks. The results are below

Test with count: 1, loops: 10000000
--------------------------------------------------
Benchmark: timing 1 iterations of for, map, pdl...
       for:  2 wallclock secs ( 2.12 usr +  0.29 sys =  2.41 CPU) @  0
+.41/s (n=1)
            (warning: too few iterations for a reliable count)
       map:  5 wallclock secs ( 3.86 usr +  1.15 sys =  5.01 CPU) @  0
+.20/s (n=1)
            (warning: too few iterations for a reliable count)
       pdl: 17 wallclock secs ( 2.83 usr +  0.70 sys =  3.53 CPU) @  0
+.28/s (n=1)
            (warning: too few iterations for a reliable count)

Test with count: 10000, loops: 1000
--------------------------------------------------
Benchmark: timing 10000 iterations of for, map, pdl...
       for:  2 wallclock secs ( 1.86 usr +  0.00 sys =  1.86 CPU) @ 53
+76.34/s (n=10000)
       map:  3 wallclock secs ( 3.37 usr +  0.00 sys =  3.37 CPU) @ 29
+67.36/s (n=10000)
       pdl:  3 wallclock secs ( 2.15 usr +  0.01 sys =  2.16 CPU) @ 46
+29.63/s (n=10000)

Test with count: 1000, loops: 10000
--------------------------------------------------
Benchmark: timing 1000 iterations of for, map, pdl...
       for:  1 wallclock secs ( 1.84 usr +  0.00 sys =  1.84 CPU) @ 54
+3.48/s (n=1000)
       map:  4 wallclock secs ( 3.39 usr +  0.00 sys =  3.39 CPU) @ 29
+4.99/s (n=1000)
       pdl:  2 wallclock secs ( 1.98 usr +  0.01 sys =  1.99 CPU) @ 50
+2.51/s (n=1000)

Test with count: 10000000, loops: 1
--------------------------------------------------
Benchmark: timing 10000000 iterations of for, map, pdl...
       for: 10 wallclock secs ( 8.82 usr +  0.01 sys =  8.83 CPU) @ 11
+32502.83/s (n=10000000)
       map:  7 wallclock secs ( 6.40 usr + -0.01 sys =  6.39 CPU) @ 15
+64945.23/s (n=10000000)
       pdl: 201 wallclock secs (198.35 usr +  0.28 sys = 198.63 CPU) @
+ 50344.86/s (n=10000000)
[download]

Some aspects of the above are not surprising. There is a considerable overhead, I am guessing, in setting up a piddle, so setting up lots of small piddles many times is expensive. But, I was expecting, perhaps wrongly, that map would always be cheaper than for. Dunno why I believed that looping is always expensive, although, I wonder if under the hood, map is looping really not just effectively.

Please explain, gently, if possible. Thanks in advance.

Bonus question -- what exactly is the difference between the usr and the sys seconds in the Benchmark reported timings?

Note: In the PDL test I am not really ending up with an out array. Instead, I am getting a piddle. If I convert $out to @out using PDL's list, it just kills PDL.

when small people start casting long shadows, it is time to go to bed

Back to Seekers of Perl Wisdom