What BrowserUK is saying about “only 4 at a time” is anything but “an aside.” It’s the key to the whole thing.
Consider what you see going on every day in a fast-food joint. There’s a certain number of workers, and all of them are working on a queue of incoming food orders. If 1,000 orders suddenly come pouring in, then the queues will get very long, but the kitchen won’t get overcrowded. The number of workers in the kitchen, and each of their assigned tasks, is set to maximize throughput, which means that all the workers are working as fast as they can and that they are not competing with one another for resources. The restaurant doesn’t lose the ability to do its job ... it just takes (predictably!) longer. (And they can tell you, within a reasonably accurate time-window, just how long it will take.)
The loss of throughput, furthermore, isn’t linear: no matter what the ruling-constraint actually is, the loss becomes exponential. If you plot the average completion-time as the y-axis on a graph, where the “number of simultaneous processes” is x, the resulting graph has an elbow-shape: it gradually gets worse, then, !!wham!! it “hits the wall” and goes to hell and never comes back. If you plot “number of seconds required to complete 1,000 requests” as the y, the lesson becomes even clearer. You will finish the work-load faster (“you will complete the work, period ...”) by controlling the number of simultaneous workers, whether they be processes or threads.
The number-one resource of contention is always: virtual memory. “It’s the paging that gets ya,” and we have a special word for what happens: “thrashing.” But any ruling-constraint can cause congestive collapse, with similarly catastrophic results.