There's nothing to stop you from using
Inline::C, with assembler, to make it all happen. The catch is that you have to construct an array which you return to Perl, so it would be wise to read up on how
XS works. The thing is, though, that the cost of manufacturing an array vastly exceeds any speed gain you'd get by using inline assembler. These days, division isn't nearly as expensive as it used to be, and memory is often the bottleneck.
I think if you're that concerned about speed, though, you wouldn't be using Perl anyway. You'd be using C, or C++, or possibly even FORTRAN.