I am still not convinced you have to transpose the matrix but maybe I am having an off-day, i.e. to blind to see it;-) When I read to your post the first thing that jumps into mind is the use of a database., especially since your "reduce functions" are simple.
However, assuming you insist on transposing, you might be able to speed things up. The case N != M is more difficult then N = M. The approach you take right now is the straightforward approach but this can give poor performance (as you have probably noticed).
In general: depending on circumstances, things like how much storage you have available and more importantly how M relates to N (|N-M| small or gcd(N,M) is not small) you can improve by using a more refined algorithm. (This I have taken from the Wiki, already suggested in my earlier response.) You can find pointers there to articles on the subject of in-place matrix transposition. I don’t know of any Perl implementations but the pointers on the Wiki also contain source code so you should be able to rewrite that into Perl if needed.
HTH, dHarry