Thanks for great challenge, first of all (you were our taskmaster, right?) I didn't mention your solution, because it was kind of slow for comparisons. But now I see where it and challenge itself have originated. PDL is very TIMTOWTDI-ish, and range is ultimately important tool, to extract/address rectangular areas from ndarrays of any dimensions. But, eh-hm, don't you see that the POD you linked sings praises to wonders of broadcasting, which (i.e. broadcasting) you simply discarded? Broadcasting really only happens in this fragment:
...-> sumover-> sumover
which you replaced with
...-> clump(2)-> sumover
(Frankly, it's obvious I think that huge speed difference of Game-of-Life implementations in the linked tutorial is due to the way looping was generally performed rather than this "broadcasting" only, -- but perhaps that's how tutorials work.)
Consider:
sub sms_WxH_PDL_range ( $m, $w, $h ) {
my ( $W, $H ) = $m-> dims;
$m-> range( ndcoords( $W - $w + 1, $H - $h + 1 ), [ $w, $h ])
-> reorder( 2, 3, 0, 1 )
-> clump( 2 )
-> sumover
}
sub sms_WxH_PDL_range_b ( $m, $w, $h ) {
my ( $W, $H ) = $m-> dims;
$m-> range( ndcoords( $W - $w + 1, $H - $h + 1 ), [ $w, $h ])
-> reorder( 2, 3, 0, 1 )
-> sumover
-> sumover
}
__END__
Time (s) vs. N (NxN submatrix, PDL: Double D [300,300] matrix)
+-----------------------------------------------------------+
|+ + + + + + |
1.6 |-+ A +-|
| |
| |
1.4 |-+ +-|
| |
1.2 |-+ +-|
| A |
| |
1 |-+ +-|
| |
| B |
0.8 |-+ +-|
| A |
| |
0.6 |-+ B +-|
| |
0.4 |-+ A +-|
| B |
| A |
0.2 |-+ A B +-|
| A B B |
| A B B D D |
0 |-+ D D D D D D D D D C C +-|
|+ + + + + + |
+-----------------------------------------------------------+
0 5 10 15 20 25
sms_WxH_PDL_range A
sms_WxH_PDL_range_b B
sms_WxH_PDL_lags C
sms_WxH_PDL_naive D
+----+-------+-------+-------+-------+
| N | A | B | C | D |
+----+-------+-------+-------+-------+
| 2 | 0.015 | 0.008 | 0.000 | 0.000 |
| 3 | 0.021 | 0.018 | 0.000 | 0.000 |
| 4 | 0.044 | 0.021 | 0.000 | 0.000 |
| 5 | 0.073 | 0.047 | 0.000 | 0.003 |
| 6 | 0.101 | 0.060 | 0.000 | 0.000 |
| 8 | 0.193 | 0.104 | 0.000 | 0.005 |
| 10 | 0.294 | 0.138 | 0.000 | 0.005 |
| 12 | 0.435 | 0.232 | 0.000 | 0.010 |
| 16 | 0.711 | 0.344 | 0.000 | 0.015 |
| 20 | 1.115 | 0.549 | 0.000 | 0.026 |
| 25 | 1.573 | 0.828 | 0.000 | 0.047 |
+----+-------+-------+-------+-------+
(I took liberty to use couple of numbers as args to ndcoords instead of matrix/slice, which only serves as source of these 2 numbers). Note, the matrix is now smaller than in previous tests. Both A and B versions are very much slower than the so far slowest "naive" variant. Though ndcoords builds a relatively large ndarray to feed to range, I think range is simply not written with speed/performance as its goal.
It's actually tempting to try to improve Game of Life PDL implementation from the tutorial:
use strict;
use warnings;
use experimental qw/ say postderef signatures /;
use Time::HiRes 'time';
use PDL;
use PDL::NiceSlice;
use Test::PDL 'eq_pdl';
use constant STEPS => 100;
my $x = zeroes( 200, 200 );
# Put in a simple glider.
$x(1:3,1:3) .= pdl ( [1,1,1],
[0,0,1],
[0,1,0] );
my $backup = $x-> copy;
printf "Game of Life!\nMatrix: %s, %d generations\n",
$x-> info, STEPS;
# Tutorial
my $t = time;
my $ct = 0;
for ( 1 .. STEPS ) {
my $t_ = time;
# Calculate the number of neighbours per cell.
my $n = $x->range(ndcoords($x)-1,3,"periodic")->reorder(2,3,0,1);
$n = $n->sumover->sumover - $x;
$ct += time - $t_;
# Calculate the next generation.
$x = ((($n == 2) + ($n == 3))* $x) + (($n==3) * !$x);
}
printf "Tutorial: %0.3f s (core time: %0.3f)\n",
time - $t, $ct;
# "Lags"
my $m = $backup-> copy;
$t = time;
$ct = 0;
for ( 1 .. STEPS ) {
my $t_ = time;
# Calculate the number of neighbours per cell.
my $n = sms_GoL_lags( $m ) - $m;
$ct += time - $t_;
# Calculate the next generation.
$m = ((($n == 2) + ($n == 3))* $m) + (($n == 3) * !$m);
}
printf "\"lags\": %0.3f s (core time: %0.3f)\n",
time - $t, $ct;
die unless eq_pdl( $x, $m );
sub _do_dimension_GoL ( $m ) {
$m-> slice( -1 )-> glue( 0, $m, $m-> slice( 0 ))
-> lags( 0, 1, ( $m-> dims )[0] )
-> sumover
-> slice( '', '-1:0' )
-> xchg( 0, 1 )
}
sub sms_GoL_lags ( $m ) {
_do_dimension_GoL
_do_dimension_GoL $m
}
__END__
Game of Life!
Matrix: PDL: Double D [200,200], 100 generations
Tutorial: 1.016 s (core time: 0.835)
"lags": 0.283 s (core time: 0.108)
Sorry about crude profiling/tests; and improvement is somewhat far from what I expected. Even with "core time" singled out -- because next gen calculation is not very efficient (e.g. $n == 3 array is built twice), but that's another story -- which is "only" 8x better. Maybe all this glueing/appending to maintain constant matrix size and "wrap around" at edges takes its toll.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.