in reply to how did blocking IO become such a problem?
The notion of “a request equals a thread” is what I refer to as the “flaming-arrow strategy.” (Take an arrow, light it off, fire it into the air and forget it.) Each thread is then supposed to fend for itself, lock what it needs to, issue its requests for data (and wait for the response). But soon there are problems, often creeping into the designer’s field of vision much too late. Those problems are workflow dependencies. Certain things need to be done in a certain order. Bottlenecks develop as the various fully independent units of dispatchable work try to get things done. Where certain things must be done in a specific order, it suddenly becomes necessary to devise mutual-exclusion or counting-semaphore kludges. An “easy and intuitive” initial design does not scale up.
Borrowing an idea from a fast-food restaurant, it is much better to devise the workload as perhaps independent objects, but to manage those using a tote-board of sorts. The worker-bees work on those units of work according to some heuristic, but they do not “wait for” anything .. ever. All I/O operations performed are asynchronous, and they are performed against requests that are sitting in some particular stage in the workflow. The number of requests that are known to the system, which is variable and perhaps extremely large, is completely distinct from the number of workers that are pursuing them. The entire life cycle of a request, and much of the outer request-handling heuristics, is most easily described using a finite-state machine (FSM) algorithm.
There are plenty of workflow management scaffolds in Perl and otherwise. With them, I have been able to turn many a recalcitrant and unstable application around, and to retrofit those systems to efficiently work in clustered environments.