Funny that not redirecting the output makes such a big difference on your machine. For me, your code takes ~0.06 seconds to run when redirecting to /dev/nul, and ~0.11 seconds if not.
Be that as it may, thanks for the pointer to Algorithm::Combinatorics and the code snippet, this looks like a very useful module! And (redirecting to /dev/nul, again) I'm getting running times of ~0.5s, ~2.9s, ~12.1s, ~40.6s for @data sizes of 4, ..., 7, which is very reasonable.
EDIT: Of course, what I was actually looking for was multisets, not ordered tuples (did you read my post?), but fortunately Algorithm::Combinatorics also offers a combinations_with_repetition function for that. Funny that I completely missed this module when looking at CPAN earlier today, too.)