Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: Multithreaded Server Crashes under heavy load

by BrowserUk (Pope)
on Aug 29, 2012 at 17:40 UTC ( #990516=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Multithreaded Server Crashes under heavy load
in thread Multithreaded Server Crashes under heavy load

Is there a tool that you could recommend that can capture all of that information on a Windows environment?

Yes. Type perfmon.exe into a command line.

There is a bit of a learning curve involved in using it, but it is gui with pretty extensive help built in.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong


Comment on Re^3: Multithreaded Server Crashes under heavy load
Download Code
Re^4: Multithreaded Server Crashes under heavy load
by rmahin (Beadle) on Aug 29, 2012 at 20:40 UTC
    Ok, so heres the link to the comma delimited CSV file perfmon generated. Did not see how to get it to generate socket information for a specific process though. Also, the thing doesn't show that perl ever reached 100% cpu usage, but using the task manager shows that it spawns multiple perl processes, and that cpu usage is maxed for a while, eventually all but one perl process disappears from the task manager, cpu usage drops back down, and then the program is unresponsive.

    https://dl.dropbox.com/u/19686501/RXD_000006.csv

    Made this log by adding a new counter log, and then adding various process counters. If this was incorrect let me know.

      Posted anonymously because my account is being f**** with (permission denied) -- BrowserUk

      Did not see how to get it to generate socket information for a specific process though.

      If you go into the Performance monitor view, click the green + icon, scroll the top-left list and expand the TCPv4 topic, you can select various counters relating to TCP. The most interesting counters are "Connections Active/passive/established/reset". The trace is most useful if there is not to much other TCP traffic on the box as it isn't process specific.

      Alternatively, you could grab TCPView which is a little more intuitive for viewing the instantaneous state, but not so good for tracing over an extended period.

      For an instantaneous snapshot in a textual form, try netstat -a -b and redirect to a file.

      The log you posted only covers a 3 minute period, and there are already 301 threads running when it starts. None of them apparently finish, so the trace essentially shows a static state. No so useful. I can only assume that by the time that trace is started, everything is hung.

      Without seeing threads being started and ending it is pretty useless. You need to establish whether threads are ever ending? (Perhaps your debug log shows that?) And whether once they end, whether they are actually going away.

      I suspect that either your connections are not being closed properly and are taking a long time to time out. And/or your threads ending, but not going away. I cannot (yet) see any reason from the source code however, and the log doesn't really tell me anything.

      The elapsed time suggests that this snapshot was taken 38 hours into a run? You really need to start PerfMon immediately the server is started and monitor its cpu/threads/memory/sockets at (say) 1 minute intervals until it appears to hang. Running netstat to a file with a similar interval over the same run would give the best information.

        Good luck with your account!

        Ok collected the information again. Sorry, the other 300 threads were from another a perl process running on the machine, but was just not visible on my remote desktop session, should have figured they would show up in the log though, my mistake. Ran this one on another machine, with no other users accessing it, and no other perl programs running. The 38 hours you saw was also for the process. Nonetheless, the log I posted only covered a 3 minute period because thats just how long it takes it to crash under the stress testing I'm doing. We had seen it happen occasionally under our regular use, so to flush out the problems I created some tests to get it to happen quicker, as the time it took for it to happen seemed entirely random, sometimes a day or two. This log also only lasts a few minutes, and I did it over 10 second intervals. Some commands it executes will be as simple as making a directory, and take almost no time at all to process, and others are longer.

        https://dl.dropbox.com/u/19686501/RXD_000004.csv

        Also, I just cant really see any reason why a thread would not be exiting properly. When I run the program in debug mode, it seems to crash almost instantly which would lead me to believe could have something to do with printing to the screen, but have tried redirecting all output to a file with no change. But what is printed with the debug messages shows that all commands that the server actually receives and executes run, and close successfully. Here's the output of all the commands before it crashed https://dl.dropbox.com/u/19686501/debug.log. It shows the "DEBUG -- (30) Closing connection" printed out by

        debug("Closing connection"); shutdown($client, 2); close $client; threads->exit;
        at the end of thread subroutine. So, yeah I'm really at a loss.

        Again, thank you for your help

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990516]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2014-07-12 00:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (237 votes), past polls