|Don't ask to ask, just ask|
It's an interesting concept. If I understand (and please correct me if I'm wrong), you're using the unpredictability of the relationship between a few asynchronous threads as an entropy source, and then reducing that entropy to, in this case, a range of 0 .. 23, though it could just as easily be full bytes, or something else.
When I run it with threaded Perl 5.14.2 on Ubuntu 13.04 (the only threaded Perl I have built at the moment), my distributions are not nearly so nice as yours. I keep seeing things like:
I have no idea why there's a difference. I fiddled with different numbers of threads, but that didn't produce appreciable improvements toward getting a smoother distribution.
I also wonder if you need to consider modulo bias. Well, actually looking at the code again, I don't wonder, I'm pretty sure modulo bias can be an issue if $M is set to a number that doesn't divide evenly into 2^32.
Initially I intended to test your code with A Perl adaptation of the FIPS-140-2 test, and A randomness dispersion test, but it doesn't take that sharp of an instrument to spot clusters when there are only three numbers drawn out of a possible 24 given thousands of picks.
Any thoughts why this works for you and not for me?