I believe the most important difficulty faced here is in trying to mix n-dimension slice-type operations (ending "ND": whichND, indexND, whereND) with single-dimension slice-type ops (which, index, where). What you call "threading" (now known as "broadcasting", in line with other array-programming environments) isn't really at play here, other than the (quite correct for your application) use of n-dimensional ops.
It's possible to switch between the two modes of indexing an ndarray using https://metacpan.org/pod/PDL::Primitive#one2nd.
To answer your last question: depending on what kind of iterating you have in mind, your general approach of supplying what looks to me like a "kernel" (+/-1 on the various dims) to add to the results of a whichND looks sound. Doing all the things you want to happen in a single PDL operation (on data that holds all the operands) is what I believe is "the PDL way".