Unfortunately, for most algorithms, the nature of perl 5 and the way memory is allocated pretty much preclude attempting to code algorithms so that they can make use of the optimisations available through the presence of large L1 and L2 caches. If you don't control the allocation of memory, you have little chance of utilising the benefits that can accrue from processor caches, except accidentally.
I've recently been rediscovering the art (and joy) of coding stuff using macro assembler. Full-blown windows GUI applications in executables of less that 20k, and memory footprints even smaller than typical by an even greater margin. Processing speeds that take your breath away even on my lowly 233Mhz. The one area that requires a completely new mindset from the last time I used assembler, is trying to utilise caching and pipelining to good effect. It's a whole new art that simple didn't exist the last time I played with this stuff.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller