http://www.perlmonks.org?node_id=547234

You may have noticed PerlMonks becoming non-responsive from time to time. This is usually (of late) due to our 'A' web server taking a "time out" to indulge in some recreational "extreme swapping" for quite a few minutes (this appears to have started happening after pair.com upgraded the OS, Apache, etc.), early this year.

After several attempts, I finally captured some output from 'top' that shows much about the problem. Anyone care to offer interpretations / insights regarding this output from 'top' trying to dump the list of processes every 60 seconds when PerlMonk's 'A' web server "goes away"? Notice how it takes 5 1/2 minutes not "a bit over 60 seconds" for one update there. See the load average climb. See lots of 'httpd' processes appear, starving lots of older httpd processes for real RAM.

I'll look at this more as I find more time, but I'd be interested in well-considered theories about possible sources for such behavior.

I wish 'top' would show who the parent of each process was so I could tell which process is creating all of these extra processes, but 'top' isn't particularly flexible but is still the best tool I've found available on this system so far (no, I don't have root access and doubt seriously that pair.com would give it to me). Perhaps the next iteration of this logging should add periodic "ps" output to the logs to get that parent/child information, though I bet cron trying to start up "ps" would take so long when the problem is happening that it'd miss seeing the problem, based on past iterations. ;)

One difference between the 'A' and 'B' web servers is that the 'A' web server gets quite a lot of traffic from search engine spiders indexing PerlMonks via "http://someotherhostname/~monkads/?...". I disabled this for msnbot as it was doing twice as many hits as the next-busiest robot and was doing hits for a lot of bizarre URLs. I may soon disable it for all robots since the problem continues.

The 'top' output is in <spoiler> tags as just <readmore> tags would make it impossible to view the whole thread of discussion w/o the data "in the way". So "reveal" spoilers to see the output.

Update: Looking at the http access_log for around the time that the problem appears to start has not revealed any "smoking gun" evil URLs that somehow cause the receiving httpd to become a fork bomb, but that hay stack is rather large and the data recorded isn't ideal for finding such things. A more Everything-aware log of accesses is on my to-do list...

- tye        

Replies are listed 'Best First'.
Re: 'A' web server takes another "time out"
by samtregar (Abbot) on May 03, 2006 at 18:58 UTC
    I don't have any guesses as to what's causing so many httpds, but perhaps you can fix it by changing your Apache configuration? It seems like a well-considered MaxClients could prevent this kind of explosion.

    -sam

      Thanks for the pointer.

      It looks like I'd need read access to /usr/pair/apache/... in order to check that but I don't have it. An older copy of httpd.conf that I requested (before the upgrade) had MaxClients set to 100. I'll have to ask for a new copy...

      - tye        

        If PerlMonks is running under mod_perl you should able to use the Apache API to examine the current setting. You might even be able to dynamically change it!

        -sam

      On my experience, is very important correct values for:
      • MaxClient -> 100 Ok (it's can be more)
      • MinSpareServers -> min free instances, sugest same of StartServers
      • MaxSpareServers -> max free instances, sugest more than 60% of MaxClients
      • MaxKeepAliveRequests -> never all instances, sugest 50% of MaxClient
      • MaxRequestPerChild -> max request before kill process, set if required...

      Some time ago I had make a node Monitor instances of Apache Web server, with a script to see how are use of apache web connections online. To see historical usage, I'm sugest to use it or Apache-Tools (from Apache-Security).

      Evaluating your load averages, swap and CPU states, on my opinion optimize apache make good results... See your running time of httpd process:

      $ grep httpd 547234 | awk '{print $8}' | sort | uniq -c | head -n 5 125 0:00 110 0:01 41 0:02 14 0:03 9 0:04
      But the great info is "Parent Server Generation: XX" on server-status ... You realy need to enable this module ;)

      Current Time: Friday, 05-May-2006 10:56:47 PDT Restart Time: Tuesday, 02-May-2006 10:24:02 PDT Parent Server Generation: 3 Server uptime: 3 days 32 minutes 45 seconds Total accesses: 16557075 - Total Traffic: 349.2 GB CPU Usage: u170.547 s310.375 cu0 cs0 - .184% CPU load 63.4 requests/sec - 1.4 MB/second - 22.1 kB/request 175 requests currently being processed, 81 idle workers CKWWCKK_K_K_CKC_KK_K___KKK__K_KKKKKKKC_K_CC_KWK__WKKK_K_WKKK__WK GGG.GG.G.GGG...GGGGG..GGG.GGG.GG..GG.G..GG..G.....GWGGGGGGGGGGGG .G...W.GG.....G.GG.G..G.GG.........GGWG.G..G.G.....G...WG....G.G _K__KKCKCKWCK_WK__KKK_K_KW_KC__W___KKKK_KKKCK_KKWKKC_KCKKWKCKKWC

      --
      Marco Antonio
      Rio-PM

        MaxClients of 100 seems pretty high to me. Commodity hardward isn't going to deal with 100 simultaneous mod_perl jobs very well! Even if you have the memory to handle that many jobs, you probably don't have the CPU.

        MaxClients can be high on a front-end server which serves static content and does a reverse proxy to the mod_perl backend. Those servers do much less work per-request and a given machine can run more of them simultaneously.

        -sam

Re: 'A' web server takes another "time out"
by jonadab (Parson) on May 03, 2006 at 20:26 UTC
    See lots of 'httpd' processes appear

    I'm not sure what the usual ratio is between available system resources and system resources needed to keep up with Perlmonks requests, but _if_ the ratio were to dip below a certain critical point, then the number of new processes would grow faster than the old processes could finish. If that were the case, the total number of running processes could be expected to increase steadily, further dividing the system resources (notably RAM) available to each, in a vicious cycle, which would explain the extremeness of the symptoms you describe.

    However, that leaves open the question of what happens to trigger the event in the first place. If the available system resources were just barely adequate for handling normal (or normal peak) traffic, then a slightly-more-than-normal traffic spike could trigger it, but it seems like if the system were that close to maxed out all the time you'd probably already know it. Are there things users can do that cause substantially more activity on the server than a normal request? Too many Super Search queries at once, perhaps, or something along those lines?

    but 'top' isn't particularly flexible but is still the best tool I've found available on this system so far

    My immediate thought here is to look for process-related stuff on the CPAN, looking for something that doesn't just shell out to ps, preferably something Unix-oriented and written in pure Perl. I don't have much experience working with process tables, though, beyond what can be done with ps and top. update: My second thought is that I'm sure you're already aware some versions of top can show considerably more columns than they do show by default. The version I have here (on FreeBSD) is quite impoverished, but ISTR that the version of top that I used on Mandrake 9 had rather a lot of optional columns and a loose marble rolling around in the back of my head suggests it _may_ (it's been several months...) have had an option for showing the parent process. I mention this only on the off chance that you haven't already checked for it. Hit ? in top to see a list.


    Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.

      I'd be a bit disappointed if a mature system like FreeBSD contained this feedback loop in resource allocation. No system is perfect, but I'd come to expect better behavior when memory becomes scarce than such a feedback loop that makes the problem keep getting worse while trying to let each part continue to fight to do its thing such that nothing at all can get done and it takes so long before the system finally gives up and reboots (or is it that the system never gives up and pair.com notices the lock-up and eventually cycles power?). I recall much older systems noticing a problem and selecting processes to be completely "swapped out" (different from "paging", a more accurate term for what is often mislabeled "swapping") such that they stop fighting and other luckier processes get a chance to finish such that the resource exhaustion might pass or at least the system is capable of getting something done such that someone can "get in" in order to clean up "by hand". Note that when this happens to the 'A' web server, there is no hope of logging in to the system.

      But perhaps this is just a case of bad tuning such that Apache fights too hard and it takes a while for FreeBSD to overcome it... Perhaps that is why many processes go to "0K" resident memory usage, though I'd expect a state much different than "RUN" to be reported for a swapped-out process. This lead me to notice again the angle brackets such as on "<httpd>" and searching "man top" for what those mean I find "COMMAND is the name of the command that the process is currently running (if the process is swapped out, this column is marked '<swapped>')" which isn't completely clear but somewhat supports that interpretation.

      Since I don't have root access, I don't think trying to roll my own replacement for 'top' or 'sar' will be possible. At least, my assumption was that I'd not have access to what 'top' and 'ps' use to get all of that information about other processes. Indeed, I don't have any access to /proc (symlink to /root/proc and I have no access to even /root). But I see that neither 'top' nor 'ps' are set-UID nor set-GID so I'm not sure how the security is arranged. 'man ps' mentions needing procfs mounted (and referencing /proc and /dev/kmem). So would a self-built 'top' on an unprivileged FreeBSD account be useful? If not, I think just adding "ps" output to the existing "top" output would be one of the next steps.

      - tye        

        I don't have any access to /proc [...] 'man ps' mentions needing procfs mounted [...] I think just adding "ps" output to the existing "top" output would be one of the next steps.

        If top took 5.5 minutes in showing output between two given snapshots above, I think adding ps won't improve the situation because ps data won't be correlated at all with top's. My bet would be to play with ps o argument, which allow you to get the information of top and more. Setting PERSONALITY to "bsd" on this Linux machine allows me to run ps as I were on a FreeBSD. I hope...

        $ PERSONALITY=bsd ps faxo pid,euid,egid,ni:2,vsz:6,rss:6,pcpu,pmem,sta +t:3=ST,tname:6,stime,bsdtime,args PID EUID EGID NI VSZ RSS %CPU %MEM ST TTY STIME TIME C +OMMAND 1 0 0 0 1924 652 0.0 0.0 S ? 19:24 0:00 i +nit [2] 2 0 0 19 0 0 0.0 0.0 SN ? 19:24 0:00 [ +ksoftirqd/0] 3 0 0 -5 0 0 0.0 0.0 S< ? 19:24 0:00 [ +events/0] [...] 1368 111 111 0 26580 912 0.0 0.0 Ssl ? 19:26 0:00 / +usr/sbin/ippl -c /var/run/ippl/ippl.conf 1423 0 0 0 4800 1608 0.0 0.1 Ss ? 19:26 0:00 / +usr/lib/postfix/master 1428 101 104 0 4812 1604 0.0 0.1 S ? 19:26 0:00 +\_ pickup -l -t fifo -u -c

        You can s/args$/comm/ in order not to show parameters of commands:

        1368 111 111 0 26580 912 0.0 0.0 Ssl ? 19:26 0:00 i +ppl 1423 0 0 0 4800 1608 0.0 0.1 Ss ? 19:26 0:00 m +aster 1428 101 104 0 4812 1604 0.0 0.1 S ? 19:26 0:00 +\_ pickup

        HTH.

        --
        David Serrano

        I'd be a bit disappointed if a mature system like FreeBSD contained this feedback loop in resource allocation

        Oh, is the perlmonks server running FreeBSD? I didn't realize. In that case, top doesn't appear to show parent process IDs, unless I'm missing something. There are things I like about FreeBSD, but its version of top is not one of them. The ps that comes with FreeBSD is rather better, but in a scenario where you can't start a new process, top could be already running, and I don't know of a way to make ps do that (i.e., be already running and report output periodically).

        I recall much older systems noticing a problem and selecting processes to be completely "swapped out"

        I've observed on my desktop that FreeBSD will kill a process if it consumes too much RAM (in situations where Linux wouldn't, although Linux since circa 2.2 will also do this if the entire system is low on RAM, which is better than the Linux 2.0 behavior; but FreeBSD will kill a process for this even when there's unused swap space, if it surpasses some per-process memory usage quota). However, one process using lots of RAM is a very different scenario from many processes being spawned. I don't know what FreeBSD does with that. I could test that here with a forkbomb, I suppose...

        Indeed, I don't have any access to /proc

        That could make it hard to get a good look at the process tree.

        So would a self-built 'top' on an unprivileged FreeBSD account be useful?

        I don't know. It also seems like there _ought_ to be a tool designed to prepare a process ahead of time (preload it into RAM , go ahead and ask the operating system for a process table entry, and so forth) to be launched quickly, which might allow you to set up ps to run and then, when the problem is noticed, trigger it to go ahead. I do not, however, actually know of such a utility.

        I feel your pain. Having to work around the lack of root access to accomplish things that would be much easier _with_ root access is certainly something that can be annoying. (I can also understand why the hosting company doesn't want to hand out root access, of course, but that doesn't make your situation any less frustrating.)


        Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.
Re: 'A' web server takes another "time out"
by m.att (Pilgrim) on May 03, 2006 at 23:55 UTC
    If you're capturing regular sar data, (With the sa1/sa2 scripts) this could provide a lot of useful information beyond what top provides. (You can profile the performance on a system quite extensively with good sar output). It would be helpful if you could make the sar data from the last week or so available for download. (If available)

    The files are usually located in /var/adm/sa and should be readable from userland. I've found that FreeBSD or Linux boxes don't usually have sar enabled, (unlike a lot of commercial *NIXes) but it's worth a shot. Just tar 'em up and put them somewhere for download.

    If the data is indeed available but you don't feel comfortable sharing it, there are some utilities available to analyse the data directly, such as:

    Sadly it requires a commercial license and I can't think of any cost-free alternatives. Readers please chime in if you know of any similar analysis utilities.

    Hoping to help,

    m.att

      Yes, 'sar' was what I first reached for, realizing that it is far better to compactly collect all of the performance data so that after the fact you can view slices of it this way and that to try to figure out what the matter is...

      $ sar ksh: sar: not found $ ls -l /var/adm ls: adm: Permission denied $ ls -ld /var/adm drwxr-x--- 3 root wheel 512 Jan 22 2001 /var/adm/ $

      And I'm certainly not 'root' nor in 'wheel'. (:

      - tye        

        Well, that's a bust.. too bad.

        How about capturing some regular snapshots with vmstat? Maybe

        vmstat 60

        and a

        vmstat -d 60

        piped to a file for a few days (or at least a good bit of time before, during and after the event in question). (These commands may require different syntax if you're on FreeBSD, which I can't test with -- we're basically looking for VM stuff and IO/disk stuff... also see iostat) Maybe also throw in a few vmstat -s's for good measure. This would at least provide a little bit more detail around swap in/out and IO.

        m.att

Re: 'A' web server takes another "time out"
by eric256 (Parson) on May 03, 2006 at 21:50 UTC

    Is it simply possible that someone has some sort of scheduled DOS attack? I know its a stupid obvious question, buts its the first thing that comes to mind and maybe no one asked simply because it was so obvious. I do wonder why MaxClients isn't set low enough to stop this from happening though.


    ___________
    Eric Hodges
Re: 'A' web server takes another "time out"
by spiritway (Vicar) on May 04, 2006 at 03:10 UTC

    You might have a look at RLimitNPROC, as well. According to the Apache documentation, "Limits the number of processes that can be launched by processes launched by Apache children." It would be nice if you could get your hands on the httpd.conf file...

Re: 'A' web server takes another "time out"
by Ultra (Hermit) on May 04, 2006 at 11:56 UTC

    I guess you (or Pair in case you don't have access to access_log) should do some statistics to see if there's a significant hits/second ratio difference when everything is OK and when the forking occurs.

    Another point to check is whether the kernel version you are using has bugs concerning swap allocation.

    Also, while this wouldn't help to determine the exact nature of the problem, maybe it can help to avoid DoS - mod_evasive

    Of course, Pair should agree to install/use it ;)

    Dodge This!
Re: 'A' web server takes another "time out"
by wazoox (Prior) on May 04, 2006 at 15:50 UTC
    The system load is very high but the CPU 40% idle; I've often seen this in I/O bound situtations. Is it possibly that the system disk (especially the swap, or database disk) is anormally slow ? Perhaps the DMA isn't working properly ?

      One of my prior working theories was a disk "going bad" (having much experience with the fact that the manufacturors of commonly-used disk drives, drivers, and controllers only took away half of the point of "fault tolerance"1 resulting in drives "going bad" extremely silently, the only "evidence" being a particular pattern of slow-down).

      But that was when I didn't see good evidence of lots of swapping going on. Of course, "lots" is a relative term so, anyone, please feel free to make some calculations of disk speed based on the amount of swapping reported above and let us know if, in order to explain the CPU idleness, we'd need to have an unusually slow disk in the mix as well.

      There is no database disk on this system.

      - tye        

      1 The major point of the fault tolerance movement was to prevent things from suddenly failing. The point was that you could spend more and more resources making things more and more reliable, probably reducing how frequently something just "falls down" but you'd still end up having things suddenly fail, likely at a very inconvient time and have to spend a lot of down time and running around in a panic trying to replace / repair what failed. A "better way" was seen: Don't have single points of failure so that when something fails, things can continue on and you can schedule to replace the failed part at a convenient time, perhaps without even requiring down time. And the key to this working is that someone must be notified that a failure happened! Unfortunately, so many common modern systems include features that are tolerant of faults but provide no means of notification and often even prevent you from ever being able to tell, no matter how hard you look, that a fault happened. Hard disks are a great example of this, in my experience.

      It used to be that a hard disk going bad would start recording faults in your syslog and the frequency of these reports would rise, very slowly at first but following a geometric curve, and you'd replace the disk before it catastrophically died. Now most disks start to fail by slowing down from time-to-time, more and more dramatically, eventually nearly locking up while the disk retries reading the sector that is going bad but eventually fails, then the driver/controller retries which causes the disk to do a whole nother round of retries, then the operating system multiplies the number of retries yet again with its own retries... and eventually we just get lucky and the CRC "passes" and no hard evidence that anything at all went wrong remains.

      I'd point you to a google search for the "S.M.A.R.T." acronym but google no longer treats searching for "s m a r t" differently from searching for "s-m-a-r-t" and so you'd just get a huge list of pages containing the word "smart". That system lets you query some internal counters kept nearly hidden inside the disk drive that likely includes a count of at least some types of retries. It is the only way I've been able to find any real evidence (usually still quite vague) that a disk is starting to fail. But note that most S.M.A.R.T. tools try to be "smart" and just figure out for you whether or not the disk is about to suddenly fail (making nearly the identical mistake mentioned above) and thus usually don't tell you a single thing until the disk is within minutes of failing (usually while you aren't using the computer, and often only after the failure has already become catastrophic). So you have to jump through hoops to look at the raw S.M.A.R.T. data and make guesses at what some of them mean... Which has a lot to do with why you've probably not heard of S.M.A.R.T. before (or only heard bad things about it).

      And then there is the other extreme: parity checking of memory. When your memory is working just fine 99.999% of the time but a single bit error is noticed and reported to you by virtue of the fact that your entire computer system has suddenly become a frozen brick displaying the notification on the console. Being blissfully unaware of the rare single-bit error starts to look good when compared to having all of the in-progress work, most (probably all) of which would be unaffected by that one bit, being sent to evaporate for the sake of providing notification of a fault...

      Yes, I understand that the plumbing of notifications is hard and that is why this plumbing of notifications is so often not done or is done so badly.

Re: 'A' web server takes another "time out"
by ambrus (Abbot) on May 20, 2006 at 17:48 UTC

    I belive the two webservers are 209.197.123.153 and 66.39.54.27, right? But which one of them is called 'A'? Or is this some other distinction?

    Update 2007 feb 27: tye said in the chatterbox that "the IPs are also in order for 'a' vs 'b'" so 66.39.54.27 is the A webserver and 209.197.123.153 is B.

Re: 'A' web server takes another "time out"
by starbolin (Hermit) on May 24, 2006 at 16:45 UTC

    Could you please post the output from:  uname -a

    There is a lot here that is not stock FreeBSD behavior. Looking at the code for top I cannot see where it would insert brackets around the process name. Top simply reports what it finds in the process record; so I can only assume Apache puts the brackets there.

    On a similar vein man top asserts that swapped process are marked as <swapped> but I don't know how it can assert this as this is OS dependant behavior.

    The FreeBSD virtual memory manager usually operates very transparently. I wrote a perl program to hog all the memory then ran several instances. I saw none of the behavior exhibited in your listing. The swapping was transparent. The individual process lines showed normal running with full memory allocation. Only the swap statistics showing the the swap being used. When I ran out of swap the process was killed. I am going to play with this more. It seems that Apache does it's own vm. I can't confirm this.


    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
      Could you please post the output from: uname -a
      FreeBSD $FQDN 4.8-STABLE FreeBSD 4.8-STABLE #0: Fri Apr 15 13:34:52 ED +T 2005 $USER@$HOST:/usr/src/sys/compile/PAIRqsv i386

      with 3 items replaced by Perl scalars for privacy reasons.

      Looking at the code for top I cannot see where it would insert brackets around the process name. Top simply reports what it finds in the process record; so I can only assume Apache puts the brackets there.

      Thanks for diving into the code. But I think your assumption above is less likely than my stated guess, and I think you even provide more evidence:

      On a similar vein man top asserts that swapped process are marked as <swapped> but I don't know how it can assert this as this is OS dependant behavior.

      So it could certainly be the case that the OS, instead of replacing the program name with the literal string "<swapped>", it puts angle brackets around the program name. This makes even more sense as a literal "<swapped>" would leave you wondering what the heck got swapped out on you.

      The FreeBSD virtual memory manager usually operates very transparently. I wrote a perl program to hog all the memory then ran several instances. I saw none of the behavior exhibited in your listing. The swapping was transparent. The individual process lines showed normal running with full memory allocation. Only the swap statistics showing the the swap being used. When I ran out of swap the process was killed.

      You were only hogging swap space. That causes much different problems than hogging real memory. Note that "the process was killed" has a body buried in it, as one can't blame a specific process for exhausting the swap space and so, on a good operating system, heuristics are involved (on a bad operating system, the process unlucky enough to be the first to try to grab more space after none is available gets killed -- early Ultrix comes to mind).

      In order to hog real memory, you have to keep using the pages of memory that you've allocated. See (tye)Re: one-liner hogs; the one labeled "Memory" just tests allocating lots of virtual memory, that is, tests using a lot of swap space. The one labeled "Swap" will cause a lot of swapping (more accurately, "paging" though swapping out would likely eventually happen if you ran enough of them) because it tries to use lots of real memory. (So, yes, the labels are backward, depending on how you look at it.)

      It seems that Apache does it's own vm. I can't confirm this.

      I won't say that I know for sure that Apache does not, but I'd bet real money on it.

      - tye        

Re: 'A' web server takes another "time out"
by rootcho (Pilgrim) on Jun 14, 2007 at 23:59 UTC
    You could try to use "atop" if you can.
    It can also redirect the monitoring to file afaik.
    http://www.atconsultancy.nl/atop/