“A thousand threads or processes on one box” will not be a test-case that is actually comparable to what such a server will actually face in production. There is only one machine, only one network-card and so on. Since that hardware interface is by-definition a bottleneck, you're going to have to have lots of machines presenting load and doing so from a network that can actually handle it.
The most important “testing” is going to consist of careful code-review of the server architecture, because that is really where the bottlenecks that you are looking-for are going to be.
And that means that you can very effectively simulate the load. For example, make aggressive use of the "loopback" (software) device. One process on the machine (perhaps with affinity to a second CPU) pummels the server process with multiple random data-streams simulating 1,000 or more insane clients. The software device can present a much more-aggressive load pattern than hardware can do.
Look carefully for any network-testing software and/or hardware that might be available off-the-shelf...
|