Yes, I have. His solution and my sms_2x2_PDL look almost the same, but there's a difference: he supplies a list of slices as argument to pdl() constructor. Such list, regardless of its length, uses almost zero resources, because slice, as many other e.g. PDL::Slice functions, creates virtual ndarray i.e. header pointing to original data. The constructor, however, builds fresh new ndarray -- and that even before any summation has begun. Probably, temporary pike in memory usage is negligible (*) in case of 2x2 submatrices. Usage for WxH case is perhaps impractical and only speculative, but, suppose 1500x1500 data, 10x10 frame -- which means 100 slices 1490*1490 each i.e. ~1,7 Gb temporary monster.
*: but why not measure its impact:
sub sms_2x2_PDL_wlmb ( $m ) {
pdl(
$m-> slice( '0:-2,0:-2' ),
$m-> slice( '1:-1,0:-2' ),
$m-> slice( '0:-2,1:-1' ),
$m-> slice( '1:-1,1:-1' ),
)-> mv( -1, 0 )-> sumover
}
__END__
Time (s) vs. N (2x2 submatrix, NxN matrix)
+-----------------------------------------------------------+
|+ + + + + + |
| B |
| |
| |
| |
1.5 |-+ +-|
| |
| |
| |
| B |
| |
1 |-+ +-|
| |
| |
| |
| A |
| |
| B |
0.5 |-+ +-|
| A |
| |
| B |
| B A |
| B A |
| B A A |
0 |-+ B A +-|
|+ + + + + + |
+-----------------------------------------------------------+
0 1000 2000 3000 4000 5000
sms_2x2_PDL A
sms_2x2_PDL_wlmb B
+------+-------+-------+
| N | A | B |
+------+-------+-------+
| 400 | 0.002 | 0.011 |
| 800 | 0.006 | 0.041 |
| 1200 | 0.023 | 0.098 |
| 1600 | 0.061 | 0.180 |
| 2000 | 0.109 | 0.273 |
| 3000 | 0.238 | 0.613 |
| 4000 | 0.433 | 1.148 |
| 5000 | 0.738 | 1.758 |
+------+-------+-------+
|