Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: 'A' web server takes another "time out"

by jonadab (Parson)
on May 03, 2006 at 20:26 UTC ( [id://547258]=note: print w/replies, xml ) Need Help??


in reply to 'A' web server takes another "time out"

See lots of 'httpd' processes appear

I'm not sure what the usual ratio is between available system resources and system resources needed to keep up with Perlmonks requests, but _if_ the ratio were to dip below a certain critical point, then the number of new processes would grow faster than the old processes could finish. If that were the case, the total number of running processes could be expected to increase steadily, further dividing the system resources (notably RAM) available to each, in a vicious cycle, which would explain the extremeness of the symptoms you describe.

However, that leaves open the question of what happens to trigger the event in the first place. If the available system resources were just barely adequate for handling normal (or normal peak) traffic, then a slightly-more-than-normal traffic spike could trigger it, but it seems like if the system were that close to maxed out all the time you'd probably already know it. Are there things users can do that cause substantially more activity on the server than a normal request? Too many Super Search queries at once, perhaps, or something along those lines?

but 'top' isn't particularly flexible but is still the best tool I've found available on this system so far

My immediate thought here is to look for process-related stuff on the CPAN, looking for something that doesn't just shell out to ps, preferably something Unix-oriented and written in pure Perl. I don't have much experience working with process tables, though, beyond what can be done with ps and top. update: My second thought is that I'm sure you're already aware some versions of top can show considerably more columns than they do show by default. The version I have here (on FreeBSD) is quite impoverished, but ISTR that the version of top that I used on Mandrake 9 had rather a lot of optional columns and a loose marble rolling around in the back of my head suggests it _may_ (it's been several months...) have had an option for showing the parent process. I mention this only on the off chance that you haven't already checked for it. Hit ? in top to see a list.


Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.

Replies are listed 'Best First'.
Re^2: 'A' web server takes another "time out" (root)
by tye (Sage) on May 04, 2006 at 17:24 UTC

    I'd be a bit disappointed if a mature system like FreeBSD contained this feedback loop in resource allocation. No system is perfect, but I'd come to expect better behavior when memory becomes scarce than such a feedback loop that makes the problem keep getting worse while trying to let each part continue to fight to do its thing such that nothing at all can get done and it takes so long before the system finally gives up and reboots (or is it that the system never gives up and pair.com notices the lock-up and eventually cycles power?). I recall much older systems noticing a problem and selecting processes to be completely "swapped out" (different from "paging", a more accurate term for what is often mislabeled "swapping") such that they stop fighting and other luckier processes get a chance to finish such that the resource exhaustion might pass or at least the system is capable of getting something done such that someone can "get in" in order to clean up "by hand". Note that when this happens to the 'A' web server, there is no hope of logging in to the system.

    But perhaps this is just a case of bad tuning such that Apache fights too hard and it takes a while for FreeBSD to overcome it... Perhaps that is why many processes go to "0K" resident memory usage, though I'd expect a state much different than "RUN" to be reported for a swapped-out process. This lead me to notice again the angle brackets such as on "<httpd>" and searching "man top" for what those mean I find "COMMAND is the name of the command that the process is currently running (if the process is swapped out, this column is marked '<swapped>')" which isn't completely clear but somewhat supports that interpretation.

    Since I don't have root access, I don't think trying to roll my own replacement for 'top' or 'sar' will be possible. At least, my assumption was that I'd not have access to what 'top' and 'ps' use to get all of that information about other processes. Indeed, I don't have any access to /proc (symlink to /root/proc and I have no access to even /root). But I see that neither 'top' nor 'ps' are set-UID nor set-GID so I'm not sure how the security is arranged. 'man ps' mentions needing procfs mounted (and referencing /proc and /dev/kmem). So would a self-built 'top' on an unprivileged FreeBSD account be useful? If not, I think just adding "ps" output to the existing "top" output would be one of the next steps.

    - tye        

      I don't have any access to /proc [...] 'man ps' mentions needing procfs mounted [...] I think just adding "ps" output to the existing "top" output would be one of the next steps.

      If top took 5.5 minutes in showing output between two given snapshots above, I think adding ps won't improve the situation because ps data won't be correlated at all with top's. My bet would be to play with ps o argument, which allow you to get the information of top and more. Setting PERSONALITY to "bsd" on this Linux machine allows me to run ps as I were on a FreeBSD. I hope...

      $ PERSONALITY=bsd ps faxo pid,euid,egid,ni:2,vsz:6,rss:6,pcpu,pmem,sta +t:3=ST,tname:6,stime,bsdtime,args PID EUID EGID NI VSZ RSS %CPU %MEM ST TTY STIME TIME C +OMMAND 1 0 0 0 1924 652 0.0 0.0 S ? 19:24 0:00 i +nit [2] 2 0 0 19 0 0 0.0 0.0 SN ? 19:24 0:00 [ +ksoftirqd/0] 3 0 0 -5 0 0 0.0 0.0 S< ? 19:24 0:00 [ +events/0] [...] 1368 111 111 0 26580 912 0.0 0.0 Ssl ? 19:26 0:00 / +usr/sbin/ippl -c /var/run/ippl/ippl.conf 1423 0 0 0 4800 1608 0.0 0.1 Ss ? 19:26 0:00 / +usr/lib/postfix/master 1428 101 104 0 4812 1604 0.0 0.1 S ? 19:26 0:00 +\_ pickup -l -t fifo -u -c

      You can s/args$/comm/ in order not to show parameters of commands:

      1368 111 111 0 26580 912 0.0 0.0 Ssl ? 19:26 0:00 i +ppl 1423 0 0 0 4800 1608 0.0 0.1 Ss ? 19:26 0:00 m +aster 1428 101 104 0 4812 1604 0.0 0.1 S ? 19:26 0:00 +\_ pickup

      HTH.

      --
      David Serrano

        Heh, but that doesn't show me the one thing I'm interested in, the parent PID. The 'top' and 'ps' output don't have to be in sync; I just need a snapshot of 'ps' output at some point during the "bad time" in order to see who owns the newest 'httpd' processes.

        FYI, your hoping wasn't enough (:

        ps: euid: keyword not found ps: egid: keyword not found ps: ni:2: keyword not found ps: vsz:6: keyword not found ps: rss:6: keyword not found ps: stat:3: keyword not found ps: tname:6: keyword not found ps: stime: keyword not found ps: bsdtime: keyword not found ps: args: keyword not found PID %CPU %MEM 0 0.0 0.0 1 0.0 0.0 2 0.0 0.0 ...

        - tye        

      I'd be a bit disappointed if a mature system like FreeBSD contained this feedback loop in resource allocation

      Oh, is the perlmonks server running FreeBSD? I didn't realize. In that case, top doesn't appear to show parent process IDs, unless I'm missing something. There are things I like about FreeBSD, but its version of top is not one of them. The ps that comes with FreeBSD is rather better, but in a scenario where you can't start a new process, top could be already running, and I don't know of a way to make ps do that (i.e., be already running and report output periodically).

      I recall much older systems noticing a problem and selecting processes to be completely "swapped out"

      I've observed on my desktop that FreeBSD will kill a process if it consumes too much RAM (in situations where Linux wouldn't, although Linux since circa 2.2 will also do this if the entire system is low on RAM, which is better than the Linux 2.0 behavior; but FreeBSD will kill a process for this even when there's unused swap space, if it surpasses some per-process memory usage quota). However, one process using lots of RAM is a very different scenario from many processes being spawned. I don't know what FreeBSD does with that. I could test that here with a forkbomb, I suppose...

      Indeed, I don't have any access to /proc

      That could make it hard to get a good look at the process tree.

      So would a self-built 'top' on an unprivileged FreeBSD account be useful?

      I don't know. It also seems like there _ought_ to be a tool designed to prepare a process ahead of time (preload it into RAM , go ahead and ask the operating system for a process table entry, and so forth) to be launched quickly, which might allow you to set up ps to run and then, when the problem is noticed, trigger it to go ahead. I do not, however, actually know of such a utility.

      I feel your pain. Having to work around the lack of root access to accomplish things that would be much easier _with_ root access is certainly something that can be annoying. (I can also understand why the hosting company doesn't want to hand out root access, of course, but that doesn't make your situation any less frustrating.)


      Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://547258]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-19 03:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found