You can get so much useful work done with xargs -Pn, as shown above, if your version of Linux/Unix supports it. Perhaps most useful, you can very quickly see whether-or-not parallelism will actually be beneficial to you, without having to “write a complicated [Perl ...] program” in order to find out.
The only determinant of the runtime of this particular task will be: how fast the disk-drives, channel subsystems and so-forth can move the requisite amount of data past md5sum’s nose. The CPU processing-time pales against the I/O time, and many filesystems handle parallelism internally, on behalf of all comers, very well on their own. Lustre might give you faster and/or more-scaleable throughput on this particular task . . . or not. Certainly you should fiddle-around very extensively with the xargs approach to find out how your particular hardware configuration will (or won’t) respond favorably ... where the “sweet-spot” number of processes is, if it’s actually greater than 1, and so on ... then decide for yourself whether a more-elaborate approach is justified. (Likely it won’t be, and in any case, a perl-script that is designed simply to be run in this way via xargs is much easier to bang-out than something that actually implements its own multithread controller, and it just might work as well or even better. If you possibly can, “Jest get ’er done.”)