|Keep It Simple, Stupid|
Conceivably, EVEN a single addition of 100,000 variables on a 100,000 processor system could have a 100,000X speedup if thread creation overhead was 0.
Sorry, but that's just very naive.
A single addition of 2 numbers happens in 1 clock cycle. Spawning a new thread for each addition of 100,000 numbers even if you had 100,000 cores on your processor--which isn't going to happen any time in the next 10 years, if at all--will take far longer than just looping over the whole 100,000 on a single thread. And that's in C, never mind an interpreted language like Perl.
However, if we get back into the real world and consider processing those 100,000 additions on a 4 or 16 or 64 core system, then starting 1 thread per core and having each operate on 100,000/cores values, it's quite likely that you can achieve real (and realistic) economies.
Of course, the best approach today and in the immediate future, would be to load the data onto a Tesla GPU, and let it's 448 cores loose on it in parallel.
threads are alot less useful on today's platforms due to their high overhead.
I'd love to know what platforms you're talking about where the cost of threading was cheaper?
The only ones I'm aware of that might qualify are things like Java 1.1's 'green' threads. Whilst they were cheap to spawn--as they run completely in user space, so avoiding ring-level transitions--they were practically useless in as much as they only emulate true multi-tasking, by embedding a scheduler within the VM. As such they don't scale across cores, so when any one green thread is running, all the others are stopped. So there is no performance gain for cpu-bound processing.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.