In wonder, how large must the arrays be for this to be faster in threads than in sequence? Say you split this into four - OS level - threads (because if it's just threads within the same process, you gain nothing at all), hoping it gets evaluated on four CPUs (or cores) in parallel. But that means three out of the four threads will be queued by the OS, and will have to wait their turn to get a slice of CPU time.
Now, if the process is only halfway its current time slice, and has to give up its timeslice because it needs to wait for the other three thread to join, now, that would be a shame.
I'm not saying that build in support for threading is a bad idea. But I do think it's a bad idea to do things in parallel without the programmers explicit permission. And it ain't going to be easy, as you need to cooperate with the OS, and Perl has to run on many OSses.