|more useful options|
Re^4: Socket hang. (Windows or Perl? Solutions?) (Updated)by BrowserUk (Pope)
|on Apr 06, 2011 at 23:07 UTC||Need Help??|
I had seen SO_LINGER, but chose to try SO_DONTLINGER as the arguments are simpler. I know from previous dealings with the setting no-blocking mode on windows sockets, that there is no documentation on exactly how to supply the final argument to setsockopt, and that it is tricky. As that parameter for SO_LINGER was true, I reasoned that would be simpler that working out how to pass a struct containing two packed shorts in the right order. But I hadn't seen that documentation for SO_LINGER that suggests that setting of 1,0 might be the fastest option.
Anyway, I worked out how to do it:
I'm not sure which way around the 1 & 0 should be, so I tried it both ways, and the upshot is, it made no perceivable difference :(.
Then much reading led me to revisiting Corions earlier post, I look again at MaxUserPort and realised I'd typed MaxUserPorts. Correcting that, and setting a value of 65535, I have now made over a million connections without stalling:
I then found this: Normally, TCP does not release closed connections until the value of this entry expires. However, TCP can release connections before this value expires if it is running out of TCP control blocks (TCBs). The number of TCBs the system creates is specified by the value of the MaxFreeTcbs entry.
Which basically suggests that the only thing that was causing the stall was the artificial limitation on concurrent user ports, as without it, they get reused if needed regardless of the linger time. The generous view of that is that MS are conserving memory used by systems data structures for the benefit of workstation users who will rarely use programs that will need 65k concurrent connections. The less generous view is that they deliberately ham-string tcp/ip on workstations to persuade people to by the more expensive server variants of Windows. And the truth is probably a little of both.
The most striking thing to come out of this for me, is how the settings and defaults for TCP/IP seem to be throwbacks to a bygone era. The default linger time is defined as: at least equal to twice the maximum segment lifetime (2MSL) of the network. By default, the MSL is defined to be 120 seconds, and the value of this entry is equal to two MSLs, or 4 minutes.
4 minute maybe's in a world that has 100Gb/s transport fabrics is just ridiculous. Indeed, time-outs measured in whole seconds is pretty damn stupid. With all the different delays, time-outs, back-off algorithms and shit involved in every individual action in tcpip, it's no wonder that the HPC guys are going directly to the Ethernet layer to get throughput.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.