Thanks, nice find. By the way, perhaps I've chosen imprecise word ("sliding"). Maybe in math/science, any function, applied to overlapping infixes, is said to be applied in sliding/moving/rolling manner. What I meant instead, with the word, -- "applied using computationally efficient/cheap algorithm".
Looks like conv2d does very honest (therefore not efficient/cheap) 4-nested-loops style calculation, in PP/XS/C -- OK for very small kernels. Here's overcrowded plot, but B and E cases would suffice to show they are the same breed.
sub sms_WxH_PDL_conv2d ( $m, $w, $h ) {
$m -> conv2d( ones( $w, $h ))
-> slice(
[floor(($w-1)/2),floor(-($w+1)/2)],
[floor(($h-1)/2),floor(-($h+1)/2)] )
}
__END__
Time (s) vs. N (NxN submatrix, PDL: Double D [1500,1500] matrix)
+-----------------------------------------------------------+
1 |-+ + + + + + + + + +-|
| A |
| E |
| |
| |
| |
0.8 |-+ +-|
| B |
| |
| |
| |
0.6 |-+ A +-|
| E |
| |
| |
| |
0.4 |-+ B +-|
| E |
| |
| A |
| E B |
0.2 |-+ +-|
| E B |
| A D |
| D D E D D D D D |
| E E C C C C C C |
0 |-+ B C +-|
| + + + + + + + + |
+-----------------------------------------------------------+
2 4 6 8 10 12 14 16
sms_WxH_PDL_naive A
sms_WxH_pdlpp_4loops B
sms_WxH_PDL_lags C
sms_WxH_PDL_sliding D
sms_WxH_PDL_conv2d E
+----+-------+-------+-------+-------+-------+
| N | A | B | C | D | E |
+----+-------+-------+-------+-------+-------+
| 2 | 0.061 | 0.009 | 0.030 | 0.083 | 0.020 |
| 3 | 0.120 | 0.019 | 0.013 | 0.073 | 0.042 |
| 4 | 0.252 | 0.030 | 0.039 | 0.098 | 0.066 |
| 6 | 0.566 | 0.078 | 0.028 | 0.078 | 0.141 |
| 8 | 0.963 | 0.139 | 0.033 | 0.081 | 0.237 |
| 10 | | 0.248 | 0.031 | 0.078 | 0.366 |
| 12 | | 0.388 | 0.033 | 0.078 | 0.531 |
| 16 | | 0.728 | 0.041 | 0.069 | 0.928 |
+----+-------+-------+-------+-------+-------+