Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

'A' web server takes another "time out"

by tye (Sage)
on May 03, 2006 at 18:17 UTC ( #547234=monkdiscuss: print w/replies, xml ) Need Help??

You may have noticed PerlMonks becoming non-responsive from time to time. This is usually (of late) due to our 'A' web server taking a "time out" to indulge in some recreational "extreme swapping" for quite a few minutes (this appears to have started happening after pair.com upgraded the OS, Apache, etc.), early this year.

After several attempts, I finally captured some output from 'top' that shows much about the problem. Anyone care to offer interpretations / insights regarding this output from 'top' trying to dump the list of processes every 60 seconds when PerlMonk's 'A' web server "goes away"? Notice how it takes 5 1/2 minutes not "a bit over 60 seconds" for one update there. See the load average climb. See lots of 'httpd' processes appear, starving lots of older httpd processes for real RAM.

I'll look at this more as I find more time, but I'd be interested in well-considered theories about possible sources for such behavior.

I wish 'top' would show who the parent of each process was so I could tell which process is creating all of these extra processes, but 'top' isn't particularly flexible but is still the best tool I've found available on this system so far (no, I don't have root access and doubt seriously that pair.com would give it to me). Perhaps the next iteration of this logging should add periodic "ps" output to the logs to get that parent/child information, though I bet cron trying to start up "ps" would take so long when the problem is happening that it'd miss seeing the problem, based on past iterations. ;)

One difference between the 'A' and 'B' web servers is that the 'A' web server gets quite a lot of traffic from search engine spiders indexing PerlMonks via "http://someotherhostname/~monkads/?...". I disabled this for msnbot as it was doing twice as many hits as the next-busiest robot and was doing hits for a lot of bizarre URLs. I may soon disable it for all robots since the problem continues.

The 'top' output is in <spoiler> tags as just <readmore> tags would make it impossible to view the whole thread of discussion w/o the data "in the way". So "reveal" spoilers to see the output.

last pid: 96025; load averages: 4.81, 2.80, 2.16 up 6+23:19:50 + 10:08:06 107 processes: 1 running, 106 sleeping CPU states: 29.6% user, 0.0% nice, 6.6% system, 2.3% interrupt, 61. +5% idle Mem: 806M Active, 75M Inact, 91M Wired, 27M Cache, 112M Buf, 4284K Fre +e Swap: 4096M Total, 317M Used, 3779M Free, 7% Inuse, 3088K In, 94M Out PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMA +ND 84338 nobody 2 -10 49620K 32660K sbwait 6:13 0.00% 0.00% httpd 84049 nobody 2 -10 49396K 31396K sbwait 6:49 0.00% 0.00% httpd 84046 nobody 2 -10 48112K 31092K sbwait 6:55 0.00% 0.00% httpd 84044 nobody 2 -10 46144K 30724K sbwait 5:57 0.00% 0.00% httpd 84032 nobody 2 -10 45952K 31824K sbwait 6:01 0.00% 0.00% httpd 84041 nobody 2 -10 45220K 32836K sbwait 6:02 0.20% 0.20% httpd 84042 nobody 2 -10 44948K 30600K sbwait 6:39 0.00% 0.00% httpd 84036 nobody 2 -10 44776K 37852K sbwait 5:58 0.00% 0.00% httpd 84034 nobody 2 -10 44104K 30128K sbwait 6:26 0.00% 0.00% httpd 84051 nobody 2 -10 43768K 38004K sbwait 5:50 0.00% 0.00% httpd 84031 nobody 2 -10 43732K 29484K sbwait 5:24 0.00% 0.00% httpd 84048 nobody 2 -10 43732K 29248K sbwait 6:14 0.00% 0.00% httpd 84030 nobody 2 -10 43672K 29372K sbwait 5:16 0.00% 0.00% httpd 84202 nobody 2 -10 43516K 29040K sbwait 6:15 0.00% 0.00% httpd 84045 nobody 2 -10 43508K 16640K sbwait 5:33 0.00% 0.00% httpd 84047 nobody 2 -10 43444K 29488K sbwait 5:50 0.00% 0.00% httpd 84043 nobody 2 -10 43216K 21444K sbwait 6:27 0.00% 0.00% httpd 84037 nobody 2 -10 42288K 30280K sbwait 6:19 0.00% 0.00% httpd 84033 nobody 2 -10 41368K 29708K sbwait 6:24 0.00% 0.00% httpd 84339 nobody 2 -10 41156K 28720K sbwait 5:41 0.00% 0.00% httpd 84035 nobody 2 -10 40252K 28340K sbwait 5:40 0.00% 0.00% httpd 84039 nobody 2 -10 40232K 27472K sbwait 6:12 0.00% 0.00% httpd 84040 nobody 2 -10 40000K 27264K sbwait 6:08 0.00% 0.00% httpd 84038 nobody 2 -10 39632K 28464K sbwait 5:53 0.00% 0.00% httpd 95995 nobody 2 -10 32848K 31240K sbwait 0:01 0.82% 0.78% httpd 230 root 2 10 21324K 0K select 0:01 0.00% 0.00% <perl +5> 95899 nobody 2 -10 12952K 10860K sbwait 0:03 0.00% 0.00% httpd 95825 nobody 2 -10 12840K 9216K sbwait 0:04 0.00% 0.00% httpd 95820 nobody 2 -10 12832K 9472K sbwait 0:03 0.00% 0.00% httpd 95898 nobody 2 -10 12436K 8884K accept 0:03 0.00% 0.00% httpd 95848 nobody 2 -10 12228K 8996K sbwait 0:02 0.00% 0.00% httpd 95902 nobody 2 -10 12124K 8608K sbwait 0:02 0.00% 0.00% httpd 95900 nobody 2 -10 11708K 9780K sbwait 0:01 0.00% 0.00% httpd 95960 nobody 2 -10 10956K 9712K sbwait 0:01 0.00% 0.00% httpd 95962 nobody 2 -10 10828K 9420K sbwait 0:01 0.00% 0.00% httpd 95961 nobody 2 -10 10804K 9392K sbwait 0:01 0.00% 0.00% httpd 95963 nobody 2 -10 10720K 9448K sbwait 0:01 0.00% 0.00% httpd 95958 nobody 2 -10 10596K 9396K sbwait 0:01 0.00% 0.00% httpd 95975 nobody 2 -10 9848K 8164K sbwait 0:00 0.00% 0.00% httpd 95973 nobody 2 -10 9736K 8320K sbwait 0:00 0.00% 0.00% httpd 95996 nobody 2 -10 9512K 8324K sbwait 0:00 0.00% 0.00% httpd 95976 nobody 2 -10 9484K 7524K accept 0:00 0.00% 0.00% httpd 95974 nobody 2 -10 9416K 8016K sbwait 0:00 0.00% 0.00% httpd 95972 nobody 2 -10 9344K 7944K sbwait 0:01 0.00% 0.00% httpd 95993 nobody 2 -10 9336K 7548K sbwait 0:00 0.00% 0.00% httpd 95977 nobody 2 -10 8940K 7220K sbwait 0:00 0.00% 0.00% httpd 95957 nobody 2 -10 8868K 7592K sbwait 0:00 0.00% 0.00% httpd 95959 nobody 2 -10 8676K 7420K accept 0:00 0.00% 0.00% httpd 95980 nobody 2 -10 8576K 6728K sbwait 0:00 0.00% 0.00% httpd 95998 nobody 2 -10 8404K 7180K sbwait 0:00 0.00% 0.00% httpd 95971 nobody 2 -10 8380K 6992K sbwait 0:00 0.00% 0.00% httpd 95997 nobody 2 -10 8132K 6968K sbwait 0:00 0.00% 0.00% httpd 95994 nobody 2 -10 8056K 6696K sbwait 0:00 0.00% 0.00% httpd 183 root 10 10 5532K 648K nanslp 0:23 0.00% 0.00% perl 261 root 2 -15 5476K 1000K sbwait 3:31 0.00% 0.00% perl 96019 nobody 2 -10 4300K 2664K accept 0:00 0.00% 0.00% httpd 96022 nobody 2 -10 4300K 2664K accept 0:00 0.00% 0.00% httpd 96020 nobody 2 -10 4300K 2664K accept 0:00 0.00% 0.00% httpd 96021 nobody 2 -10 4300K 2664K accept 0:00 0.00% 0.00% httpd 96023 nobody 2 -10 4300K 2664K accept 0:00 0.00% 0.00% httpd 96025 nobody 2 -10 4300K 2664K accept 0:00 0.00% 0.00% httpd 96024 nobody 2 -10 4300K 2664K accept 0:00 0.00% 0.00% httpd 153 root 2 -10 4300K 2100K select 0:20 0.00% 0.00% httpd 211 root 10 -20 3412K 1096K nanslp 3:17 0.00% 0.00% perl 366 root 2 4 3112K 0K poll 0:00 0.00% 0.00% <stun +nel-pops> 274 root 10 4 3044K 600K nanslp 0:07 0.00% 0.00% perl 270 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 268 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 273 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 234 root 2 0 2332K 0K select 0:06 0.00% 0.00% <sshd +> 191 root 10 4 2280K 1196K nanslp 0:13 0.00% 0.00% ncftp +d 77300 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90242 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 48258 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90215 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 75810 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 239 root 10 0 2088K 0K wait 0:00 0.00% 0.00% <perl +> 95759 monkads 31 0 2028K 380K RUN 0:02 16.00% 0.78% top 262 root 2 0 1356K 0K select 0:01 0.00% 0.00% <xine +td> 201 root -6 1 1264K 216K piperd 0:00 0.00% 0.00% ncftp +d 95 root 2 0 1056K 344K poll 0:06 0.00% 0.00% syslo +g-ng 197 root 2 6 1016K 0K accept 0:00 0.00% 0.00% <popa +3d> 95749 root -6 0 1008K 388K piperd 0:00 0.00% 0.00% cron 102 root 10 0 992K 236K nanslp 0:03 0.00% 0.00% cron 207 qmails 2 0 952K 0K select 0:04 0.00% 0.00% <qmai +l-send> 319 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 321 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 315 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 316 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 320 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 322 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 318 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 317 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 217 qmaill -6 0 896K 160K piperd 0:00 0.00% 0.00% splog +ger 218 root 2 0 896K 0K select 0:00 0.00% 0.00% <qmai +l-lspawn> 219 qmailr 2 0 896K 0K select 0:00 0.00% 0.00% <qmai +l-rspawn> 220 qmailq -6 0 884K 128K piperd 0:00 0.00% 0.00% qmail +-clean 189 root 10 0 864K 0K wait 0:00 0.00% 0.00% <resp +awn> 95752 monkads 10 0 632K 0K wait 0:00 0.00% 0.00% <sh> 1 root 10 0 544K 0K wait 0:00 0.00% 0.00% init 23 root 18 0 212K 0K pause 0:00 0.00% 0.00% <adjk +erntz> 2 root -18 0 0K 0K psleep 16:17 1.81% 1.81% paged +aemon 5 root 18 0 0K 0K syncer 3:01 0.00% 0.00% synce +r 3 root 18 0 0K 0K psleep 1:00 0.00% 0.00% vmdae +mon 4 root -18 0 0K 0K psleep 0:02 0.00% 0.00% bufda +emon 6 root -2 0 0K 0K vlruwt 0:02 0.00% 0.00% vnlru 0 root -18 0 0K 0K sched 0:00 0.00% 0.00% swapp +er last pid: 96043; load averages: 13.13, 5.56, 3.24 up 6+23:21:32 + 10:09:48 124 processes: 34 running, 90 sleeping CPU states: 40.3% user, 0.0% nice, 15.0% system, 3.1% interrupt, 41. +6% idle Mem: 830M Active, 47M Inact, 92M Wired, 33M Cache, 112M Buf, 1664K Fre +e Swap: 4096M Total, 465M Used, 3630M Free, 11% Inuse, 28M In, 170M Out PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMA +ND 84338 nobody 2 -10 49620K 29096K accept 6:14 0.42% 0.39% httpd 84049 nobody -22 -10 49396K 26724K swread 6:49 0.00% 0.00% httpd 84046 nobody -22 -10 48112K 29184K swread 6:56 0.00% 0.00% httpd 84044 nobody 2 -10 46144K 264K accept 5:58 0.00% 0.00% httpd 84032 nobody 2 -10 45952K 0K RUN 6:02 0.00% 0.00% <http +d> 84041 nobody 2 -10 45220K 31024K sbwait 6:05 0.78% 0.78% httpd 84042 nobody 2 -10 44948K 0K RUN 6:39 0.00% 0.00% <http +d> 84036 nobody 2 -10 44776K 0K RUN 5:59 0.00% 0.00% <http +d> 84034 nobody -22 -10 44104K 26892K swread 6:27 0.00% 0.00% httpd 84051 nobody 2 -10 43768K 27180K sbwait 5:51 0.05% 0.05% httpd 84048 nobody 2 -10 43732K 260K accept 6:14 0.00% 0.00% httpd 84031 nobody 2 -10 43732K 0K RUN 5:24 0.00% 0.00% <http +d> 84030 nobody 2 -10 43672K 0K RUN 5:16 0.00% 0.00% <http +d> 84202 nobody 2 -10 43516K 26348K sbwait 6:16 0.10% 0.10% httpd 84045 nobody -22 -10 43508K 15972K swread 5:33 0.00% 0.00% httpd 84047 nobody 2 -10 43444K 0K RUN 5:50 0.00% 0.00% <http +d> 84043 nobody 2 -10 43216K 0K RUN 6:28 0.00% 0.00% <http +d> 84037 nobody -22 -10 42288K 27340K swread 6:20 0.00% 0.00% httpd 84033 nobody 2 -10 41368K 0K RUN 6:25 0.00% 0.00% <http +d> 84339 nobody 2 -10 41156K 0K RUN 5:41 0.00% 0.00% <http +d> 84035 nobody 2 -10 40252K 0K RUN 5:40 0.00% 0.00% <http +d> 84039 nobody 2 -10 40232K 0K RUN 6:13 5.60% 0.78% <http +d> 84040 nobody 2 -10 40000K 0K RUN 6:08 0.00% 0.00% <http +d> 84038 nobody 2 -10 39640K 0K RUN 5:54 0.00% 0.00% <http +d> 95977 nobody -22 -10 34100K 10328K swread 0:02 0.00% 0.00% httpd 95995 nobody 2 -10 33700K 9128K sbwait 0:02 0.15% 0.15% httpd 96025 nobody -22 -10 30876K 18576K swread 0:01 1.27% 1.17% httpd 230 root 2 10 21324K 0K select 0:01 0.00% 0.00% <perl +5> 95962 nobody -22 -10 17952K 12460K swread 0:02 0.00% 0.00% httpd 95899 nobody -22 -10 13928K 2936K swread 0:03 0.00% 0.00% httpd 95825 nobody 2 -10 12984K 7552K sbwait 0:04 0.00% 0.00% httpd 95820 nobody 2 -10 12852K 0K RUN 0:03 0.00% 0.00% <http +d> 95900 nobody 2 -10 12784K 0K RUN 0:01 0.00% 0.00% <http +d> 95848 nobody 2 -10 12548K 0K RUN 0:03 0.00% 0.00% <http +d> 95898 nobody 2 -10 12436K 0K RUN 0:03 2.45% 0.34% <http +d> 95960 nobody 2 -10 12376K 0K RUN 0:02 1.05% 0.15% <http +d> 95902 nobody 2 -10 12124K 0K RUN 0:02 0.00% 0.00% <http +d> 95958 nobody -22 -10 11804K 6296K swread 0:01 0.15% 0.15% httpd 95963 nobody -22 -10 11672K 6144K swread 0:02 0.00% 0.00% httpd 95959 nobody 2 -10 11588K 7720K accept 0:01 0.00% 0.00% httpd 95961 nobody 2 -10 11192K 6116K sbwait 0:01 0.00% 0.00% httpd 95998 nobody 2 -10 11144K 0K RUN 0:01 2.80% 0.39% <http +d> 95997 nobody 2 -10 11140K 0K RUN 0:01 0.00% 0.00% <http +d> 95957 nobody -22 -10 11116K 5336K swread 0:01 0.00% 0.00% httpd 95972 nobody 2 -10 11072K 5936K sbwait 0:01 0.00% 0.00% httpd 95994 nobody 2 -10 11068K 3016K sbwait 0:01 0.00% 0.00% httpd 96029 nobody 2 -10 11044K 0K RUN 0:01 0.00% 0.00% <http +d> 95996 nobody -22 -10 11020K 5648K swread 0:01 0.00% 0.00% httpd 96020 nobody 2 -10 10972K 0K RUN 0:01 0.00% 0.00% <http +d> 95973 nobody -22 -10 10956K 4208K swread 0:01 0.00% 0.00% httpd 95980 nobody 2 -10 10948K 0K RUN 0:01 0.00% 0.00% <http +d> 96021 nobody -22 -10 10932K 7396K swread 0:01 0.00% 0.00% httpd 95975 nobody 2 -10 10908K 6176K accept 0:01 0.00% 0.00% httpd 96026 nobody 2 -10 10868K 3820K sbwait 0:01 0.00% 0.00% httpd 96023 nobody 2 -10 10864K 7420K accept 0:01 0.00% 0.00% httpd 96022 nobody 2 -10 10852K 0K RUN 0:01 1.75% 0.24% <http +d> 95971 nobody 2 -10 10844K 0K RUN 0:01 0.00% 0.00% <http +d> 96030 nobody 2 -10 10840K 0K RUN 0:01 3.15% 0.44% <http +d> 96028 nobody 2 -10 10768K 0K RUN 0:01 0.00% 0.00% <http +d> 95993 nobody 2 -10 10744K 0K RUN 0:01 0.00% 0.00% <http +d> 96027 nobody 2 -10 10708K 0K RUN 0:01 0.00% 0.00% <http +d> 96024 nobody 2 -10 10668K 0K RUN 0:01 0.00% 0.00% <http +d> 96019 nobody -22 -10 9884K 6196K swread 0:01 0.00% 0.00% httpd 95976 nobody 2 -10 9724K 0K RUN 0:01 0.00% 0.00% <http +d> 95974 nobody -22 -10 9424K 3360K swread 0:00 0.00% 0.00% httpd 96045 nobody -5 -10 7528K 6180K sysctl 0:00 0.00% 0.00% httpd 96044 nobody -5 -10 7528K 6176K sysctl 0:00 0.00% 0.00% httpd 96036 nobody -5 -10 7524K 6188K sysctl 0:00 0.00% 0.00% httpd 96043 nobody -5 -10 7524K 5796K sysctl 0:00 0.00% 0.00% httpd 96047 nobody -5 -10 7508K 6160K sysctl 0:00 0.00% 0.00% httpd 96037 nobody -5 -10 7508K 6140K sysctl 0:00 0.00% 0.00% httpd 96038 nobody -5 -10 7508K 6140K sysctl 0:00 0.00% 0.00% httpd 96046 nobody -5 -10 7504K 6164K sysctl 0:00 0.00% 0.00% httpd 96048 nobody -5 -10 7504K 5716K sysctl 0:00 0.00% 0.00% httpd 96049 nobody -5 -10 7496K 6148K sysctl 0:00 0.00% 0.00% httpd 183 root 10 10 5532K 0K RUN 0:23 0.00% 0.00% <perl +> 261 root 2 -15 5476K 992K sbwait 3:31 0.00% 0.00% perl 96050 nobody -5 -10 4300K 2656K accept 0:00 0.00% 0.00% httpd 153 root 2 -10 4300K 2016K select 0:20 0.00% 0.00% httpd 211 root -6 -20 3412K 808K piperd 3:17 0.00% 0.00% perl 366 root 2 4 3112K 0K poll 0:00 0.00% 0.00% <stun +nel-pops> 274 root -6 4 3044K 588K piperd 0:07 0.00% 0.00% perl 270 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 268 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 273 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 234 root 2 0 2332K 0K select 0:06 0.00% 0.00% <sshd +> 191 root 10 4 2280K 1080K nanslp 0:13 0.00% 0.00% ncftp +d 77300 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90242 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 48258 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90215 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 75810 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 239 root 10 0 2088K 0K wait 0:00 0.00% 0.00% <perl +> 95759 monkads 28 0 2048K 308K RUN 0:03 0.00% 0.00% top 262 root 2 0 1356K 0K select 0:01 0.00% 0.00% <xine +td> 201 root -6 1 1264K 168K piperd 0:00 0.00% 0.00% ncftp +d 95 root 2 0 1056K 0K poll 0:06 0.00% 0.00% <sysl +og-ng> 197 root 2 6 1016K 0K accept 0:00 0.00% 0.00% <popa +3d> 95749 root -6 0 1008K 236K piperd 0:00 0.00% 0.00% cron 96039 root -5 0 992K 356K sysctl 0:00 0.00% 0.00% cron 102 root 10 0 992K 0K nanslp 0:03 0.00% 0.00% <cron +> 207 qmails 2 0 952K 0K select 0:04 0.00% 0.00% <qmai +l-send> 319 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 321 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 315 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 316 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 320 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 322 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 318 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 317 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 217 qmaill -6 0 896K 116K piperd 0:00 0.00% 0.00% splog +ger 218 root 2 0 896K 0K select 0:00 0.00% 0.00% <qmai +l-lspawn> 219 qmailr 2 0 896K 0K select 0:00 0.00% 0.00% <qmai +l-rspawn> 220 qmailq -6 0 884K 88K piperd 0:00 0.00% 0.00% qmail +-clean 189 root 10 0 864K 0K wait 0:00 0.00% 0.00% <resp +awn> 95752 monkads 10 0 632K 0K wait 0:00 0.00% 0.00% <sh> 1 root 10 0 544K 0K wait 0:00 0.00% 0.00% init 23 root 18 0 212K 0K pause 0:00 0.00% 0.00% <adjk +erntz> 2 root -18 0 0K 0K wswbuf 16:24 3.12% 3.12% paged +aemon 3 root 18 0 0K 0K psleep 1:02 1.61% 1.61% vmdae +mon 5 root 18 0 0K 0K syncer 3:01 0.00% 0.00% synce +r 4 root -18 0 0K 0K psleep 0:02 0.00% 0.00% bufda +emon 6 root -2 0 0K 0K vlruwt 0:02 0.00% 0.00% vnlru 0 root -22 0 0K 0K swread 0:00 0.00% 0.00% swapp +er last pid: 96100; load averages: 12.36, 7.48, 4.29 up 6+23:26:59 + 10:15:15 159 processes: 1 running, 158 sleeping CPU states: 28.4% user, 0.0% nice, 7.7% system, 3.0% interrupt, 60. +9% idle Mem: 880M Active, 17M Inact, 93M Wired, 12M Cache, 112M Buf, 1664K Fre +e Swap: 4096M Total, 683M Used, 3413M Free, 16% Inuse, 66M In, 241M Out PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMA +ND 84338 nobody -22 -10 49620K 21480K swread 6:14 0.00% 0.00% httpd 84049 nobody -22 -10 49396K 21332K swread 6:49 0.00% 0.00% httpd 84046 nobody -22 -10 48112K 22696K swread 6:56 0.00% 0.00% httpd 84044 nobody -22 -10 46144K 21812K swread 5:58 0.00% 0.00% httpd 84032 nobody -22 -10 45952K 24084K swread 6:02 0.00% 0.00% httpd 84041 nobody -22 -10 45220K 26016K swread 6:05 0.00% 0.00% httpd 84042 nobody -22 -10 44948K 21868K swread 6:40 0.00% 0.00% httpd 84036 nobody -22 -10 44776K 23204K swread 6:00 0.00% 0.00% httpd 84034 nobody -22 -10 44104K 24428K swread 6:27 0.00% 0.00% httpd 84051 nobody -22 -10 43768K 22316K swread 5:51 0.00% 0.00% httpd 84048 nobody -22 -10 43732K 16604K swread 6:15 0.00% 0.00% httpd 84031 nobody -22 -10 43732K 12844K swread 5:24 0.00% 0.00% httpd 84030 nobody 2 -10 43672K 22964K sbwait 5:17 0.00% 0.00% httpd 84202 nobody -22 -10 43516K 21788K swread 6:16 0.00% 0.00% httpd 84045 nobody -22 -10 43508K 13804K swread 5:33 0.00% 0.00% httpd 84047 nobody -22 -10 43444K 21072K swread 5:51 0.00% 0.00% httpd 84043 nobody -22 -10 43216K 21724K swread 6:28 0.00% 0.00% httpd 84037 nobody -22 -10 42288K 24248K swread 6:20 0.00% 0.00% httpd 84033 nobody -22 -10 41368K 21056K swread 6:25 0.00% 0.00% httpd 84339 nobody -22 -10 41156K 16192K swread 5:41 0.00% 0.00% httpd 84035 nobody -22 -10 40252K 22736K swread 5:40 0.00% 0.00% httpd 84039 nobody -22 -10 40232K 16004K swread 6:13 0.00% 0.00% httpd 84040 nobody -22 -10 40000K 16204K swread 6:08 0.00% 0.00% httpd 84038 nobody -22 -10 39640K 24008K swread 5:55 0.00% 0.00% httpd 96070 nobody -22 -10 37316K 32432K swread 0:06 0.59% 0.59% httpd 96069 nobody -22 -10 36860K 32116K swread 0:04 0.54% 0.54% httpd 95977 nobody -22 -10 34100K 6892K swread 0:02 0.00% 0.00% httpd 95995 nobody -22 -10 33740K 7176K swread 0:02 0.00% 0.00% httpd 96025 nobody -22 -10 32972K 3128K swread 0:02 0.00% 0.00% httpd 230 root 2 10 21324K 0K select 0:01 0.00% 0.00% <perl +5> 96082 nobody 2 -10 19172K 276K accept 0:05 0.00% 0.00% httpd 95962 nobody -22 -10 18144K 3616K swread 0:02 0.00% 0.00% httpd 95998 nobody -22 -10 17328K 5088K swread 0:01 0.00% 0.00% httpd 95994 nobody -22 -10 17284K 5212K swread 0:01 0.00% 0.00% httpd 96026 nobody -22 -10 17152K 6100K swread 0:01 0.00% 0.00% httpd 96050 nobody -22 -10 16908K 6072K swread 0:01 0.00% 0.00% httpd 96054 nobody -22 -10 14024K 7732K swread 0:03 0.00% 0.00% httpd 95899 nobody -22 -10 13884K 6792K swread 0:04 0.00% 0.00% httpd 96049 nobody -22 -10 13388K 8284K swread 0:02 0.34% 0.34% httpd 95825 nobody -22 -10 13152K 7640K swread 0:05 0.00% 0.00% httpd 95820 nobody -22 -10 12876K 7028K swread 0:03 0.00% 0.00% httpd 95960 nobody -22 -10 12840K 6724K swread 0:02 0.00% 0.00% httpd 95900 nobody -22 -10 12804K 5572K swread 0:02 0.00% 0.00% httpd 95898 nobody -22 -10 12604K 6360K swread 0:04 0.00% 0.00% httpd 95848 nobody 2 -10 12548K 6344K sbwait 0:03 0.00% 0.00% httpd 95971 nobody -22 -10 12516K 5708K swread 0:01 0.00% 0.00% httpd 96029 nobody -22 -10 12444K 5088K swread 0:01 0.00% 0.00% httpd 96066 nobody -22 -10 12392K 6732K swread 0:01 0.00% 0.00% httpd 96080 nobody -22 -10 12284K 916K swread 0:02 0.00% 0.00% httpd 96044 nobody 2 -10 12260K 7152K sbwait 0:02 0.00% 0.00% httpd 96067 nobody -22 -10 12188K 7488K swread 0:02 0.00% 0.00% httpd 95972 nobody 2 -10 12132K 0K accept 0:02 0.00% 0.00% httpd 95902 nobody -22 -10 12124K 5216K swread 0:02 0.00% 0.00% httpd 95958 nobody -22 -10 12100K 6196K swread 0:01 0.00% 0.00% httpd 95957 nobody 2 -10 12028K 5672K sbwait 0:02 0.34% 0.34% httpd 96036 nobody -22 -10 11900K 8356K swread 0:01 0.00% 0.00% httpd 96020 nobody 2 -10 11804K 6160K sbwait 0:01 0.00% 0.00% httpd 96074 nobody -22 -10 11772K 1488K swread 0:01 0.00% 0.00% httpd 95997 nobody -22 -10 11720K 6036K swread 0:01 0.00% 0.00% httpd 95980 nobody -22 -10 11672K 5968K swread 0:02 0.00% 0.00% httpd 95963 nobody -22 -10 11672K 5056K swread 0:02 0.00% 0.00% httpd 96022 nobody -22 -10 11652K 6044K swread 0:01 0.00% 0.00% httpd 95996 nobody -22 -10 11648K 6500K swread 0:02 0.00% 0.00% httpd 95959 nobody -22 -10 11588K 5408K swread 0:01 0.00% 0.00% httpd 96055 nobody -22 -10 11564K 6880K swread 0:01 0.00% 0.00% httpd 95975 nobody -22 -10 11404K 6316K swread 0:01 0.00% 0.00% httpd 95993 nobody -22 -10 11348K 6736K swread 0:01 0.00% 0.00% httpd 96048 nobody -22 -10 11320K 6436K swread 0:01 0.00% 0.00% httpd 95961 nobody -22 -10 11320K 4200K swread 0:01 0.00% 0.00% httpd 96051 nobody -22 -10 11244K 6740K swread 0:01 0.00% 0.00% httpd 96043 nobody -22 -10 11228K 5988K swread 0:01 0.00% 0.00% httpd 96046 nobody -22 -10 11148K 6128K swread 0:01 0.00% 0.00% httpd 96021 nobody -22 -10 11128K 5560K swread 0:01 0.00% 0.00% httpd 96038 nobody -22 -10 11100K 6244K swread 0:01 0.00% 0.00% httpd 96045 nobody -22 -10 11052K 5876K swread 0:01 0.00% 0.00% httpd 96024 nobody -22 -10 11024K 4348K swread 0:01 0.00% 0.00% httpd 96028 nobody -22 -10 11000K 5292K swread 0:01 0.00% 0.00% httpd 95973 nobody -22 -10 10972K 5256K swread 0:01 0.00% 0.00% httpd 96030 nobody -22 -10 10924K 5460K swread 0:01 0.00% 0.00% httpd 96027 nobody -22 -10 10868K 4608K swread 0:01 0.00% 0.00% httpd 96023 nobody -22 -10 10864K 5412K swread 0:01 0.00% 0.00% httpd 96019 nobody -22 -10 10648K 5228K swread 0:01 0.00% 0.00% httpd 95976 nobody -22 -10 10020K 4664K swread 0:01 0.00% 0.00% httpd 96052 nobody -22 -10 9720K 5980K swread 0:00 0.00% 0.00% httpd 96037 nobody -22 -10 9432K 4760K swread 0:00 0.00% 0.00% httpd 95974 nobody -22 -10 9424K 2924K swread 0:00 0.00% 0.00% httpd 96077 nobody -22 -10 9096K 5168K swread 0:00 0.00% 0.00% httpd 96071 nobody -22 -10 8708K 4636K swread 0:00 0.00% 0.00% httpd 96047 nobody -22 -10 8668K 776K swread 0:00 0.00% 0.00% httpd 96078 nobody -22 -10 8568K 5004K swread 0:00 0.00% 0.00% httpd 96053 nobody -22 -10 8396K 2956K swread 0:00 0.00% 0.00% httpd 96076 nobody -22 -10 8356K 3336K swread 0:00 0.00% 0.00% httpd 96065 nobody -22 -10 8048K 3004K swread 0:00 0.00% 0.00% httpd 96079 nobody -5 -10 7528K 3752K sysctl 0:00 0.00% 0.00% httpd 96096 nobody -5 -10 7528K 3624K sysctl 0:00 0.00% 0.00% httpd 96095 nobody -5 -10 7528K 3592K sysctl 0:00 0.00% 0.00% httpd 96087 nobody -5 -10 7524K 3608K sysctl 0:00 0.00% 0.00% httpd 96083 nobody -5 -10 7508K 3620K sysctl 0:00 0.00% 0.00% httpd 96094 nobody -5 -10 7508K 3584K sysctl 0:00 0.00% 0.00% httpd 96081 nobody -5 -10 7496K 3512K sysctl 0:00 0.00% 0.00% httpd 183 root 10 10 5532K 636K nanslp 0:23 0.00% 0.00% perl 261 root 2 -15 5476K 1108K sbwait 3:31 0.00% 0.00% perl 96101 root -5 -10 4300K 2072K pfault 0:00 0.00% 0.00% httpd 153 root 2 -10 4300K 1960K select 0:20 0.00% 0.00% httpd 211 root -6 -20 3412K 700K piperd 3:17 0.00% 0.00% perl 366 root 2 4 3112K 0K poll 0:00 0.00% 0.00% <stun +nel-pops> 274 root -6 4 3044K 412K piperd 0:07 0.00% 0.00% perl 270 root -6 4 3044K 324K piperd 0:01 0.00% 0.00% perl 268 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 273 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 234 root 2 0 2332K 0K select 0:06 0.00% 0.00% <sshd +> 191 root 10 4 2280K 1072K nanslp 0:13 0.00% 0.00% ncftp +d 77300 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90242 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 48258 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90215 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 75810 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 95759 monkads 28 0 2092K 300K RUN 0:03 0.00% 0.00% top 239 root 10 0 2088K 0K wait 0:00 0.00% 0.00% <perl +> 262 root 2 0 1356K 0K select 0:01 0.00% 0.00% <xine +td> 201 root -6 1 1264K 168K piperd 0:00 0.00% 0.00% ncftp +d 95 root -22 0 1056K 116K swread 0:06 0.00% 0.00% syslo +g-ng 197 root 2 6 1016K 0K accept 0:00 0.00% 0.00% <popa +3d> 96039 root -6 0 1008K 336K piperd 0:00 0.00% 0.00% cron 95749 root -6 0 1008K 184K piperd 0:00 0.00% 0.00% cron 96072 root -6 0 1008K 80K piperd 0:00 0.00% 0.00% cron 102 root 10 0 992K 0K nanslp 0:03 0.00% 0.00% <cron +> 207 qmails 2 0 952K 0K select 0:04 0.00% 0.00% <qmai +l-send> 321 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 319 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 315 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 316 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 320 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 322 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 318 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 317 root 3 0 952K 0K ttyin 0:00 0.00% 0.00% <gett +y> 217 qmaill -6 0 896K 124K piperd 0:00 0.00% 0.00% splog +ger 218 root 2 0 896K 0K select 0:00 0.00% 0.00% <qmai +l-lspawn> 219 qmailr 2 0 896K 0K select 0:00 0.00% 0.00% <qmai +l-rspawn> 220 qmailq -6 0 884K 88K piperd 0:00 0.00% 0.00% qmail +-clean 189 root 10 0 864K 0K wait 0:00 0.00% 0.00% <resp +awn> 96097 root -5 0 724K 16K sysctl 0:00 0.00% 0.00% hps 96084 root 10 0 632K 0K wait 0:00 0.00% 0.00% <sh> 96063 root 10 0 632K 0K wait 0:00 0.00% 0.00% <sh> 95752 monkads 10 0 632K 0K wait 0:00 0.00% 0.00% <sh> 96061 root 10 0 628K 0K wait 0:00 0.00% 0.00% <sh> 96073 root 10 0 628K 0K wait 0:00 0.00% 0.00% <sh> 96098 root 10 4 628K 0K wait 0:00 0.00% 0.00% <sh> 96075 root 10 4 628K 0K wait 0:00 0.00% 0.00% <sh> 1 root 10 0 544K 0K wait 0:00 0.00% 0.00% init 96089 root -5 4 380K 8K sysctl 0:00 0.00% 0.00% ps 96090 root -5 4 228K 40K sysctl 0:00 0.00% 0.00% tail 23 root 18 0 212K 0K pause 0:00 0.00% 0.00% <adjk +erntz> 2 root -18 0 0K 0K wswbuf 16:33 1.86% 1.86% paged +aemon 5 root 18 0 0K 0K syncer 3:01 0.00% 0.00% synce +r 3 root 18 0 0K 0K psleep 1:03 0.00% 0.00% vmdae +mon 4 root -18 0 0K 0K psleep 0:02 0.00% 0.00% bufda +emon 6 root -2 0 0K 0K vlruwt 0:02 0.00% 0.00% vnlru 0 root -18 0 0K 0K sched 0:00 0.00% 0.00% swapp +er last pid: 96223; load averages: 17.47, 13.70, 8.74 up 6+23:28:32 + 10:16:48 255 processes: 36 running, 217 sleeping, 2 zombie CPU states: 14.7% user, 0.0% nice, 4.0% system, 2.4% interrupt, 78. +9% idle Mem: 872M Active, 27M Inact, 96M Wired, 6424K Cache, 112M Buf, 1664K F +ree Swap: 4096M Total, 1133M Used, 2963M Free, 27% Inuse, 190M In, 518M Ou +t PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMA +ND 84338 nobody -22 -10 49620K 18376K swread 6:14 0.00% 0.00% httpd 84049 nobody -22 -10 49396K 21124K swread 6:50 0.00% 0.00% httpd 84046 nobody -22 -10 48112K 21748K swread 6:56 0.00% 0.00% httpd 84044 nobody -22 -10 46144K 18480K swread 5:58 0.00% 0.00% httpd 84032 nobody -22 -10 45952K 22016K swread 6:02 0.00% 0.00% httpd 84041 nobody -22 -10 45220K 15280K swread 6:05 0.00% 0.00% httpd 84042 nobody -22 -10 44948K 18696K swread 6:40 0.00% 0.00% httpd 84036 nobody -22 -10 44776K 11600K swread 6:00 0.00% 0.00% httpd 84034 nobody -22 -10 44104K 21084K swread 6:27 0.00% 0.00% httpd 84051 nobody -22 -10 43768K 10052K swread 5:51 0.00% 0.00% httpd 84048 nobody -22 -10 43732K 13084K swread 6:15 0.00% 0.00% httpd 84031 nobody -22 -10 43732K 8996K swread 5:24 0.00% 0.00% httpd 84030 nobody -22 -10 43672K 12428K swread 5:17 0.00% 0.00% httpd 84202 nobody -22 -10 43516K 20212K swread 6:16 0.00% 0.00% httpd 84045 nobody -22 -10 43508K 9856K swread 5:33 0.00% 0.00% httpd 84047 nobody -22 -10 43444K 19036K swread 5:51 0.00% 0.00% httpd 84043 nobody -22 -10 43216K 19704K swread 6:28 0.00% 0.00% httpd 84037 nobody -22 -10 42288K 20356K swread 6:20 0.00% 0.00% httpd 84033 nobody -22 -10 41368K 17776K swread 6:25 0.00% 0.00% httpd 84339 nobody -22 -10 41156K 12856K swread 5:41 0.00% 0.00% httpd 84035 nobody -22 -10 40252K 20056K swread 5:40 0.00% 0.00% httpd 84039 nobody -22 -10 40232K 7588K swread 6:13 0.00% 0.00% httpd 84040 nobody -22 -10 40000K 13296K swread 6:08 0.00% 0.00% httpd 84038 nobody -22 -10 39640K 23008K swread 5:55 0.00% 0.00% httpd 96070 nobody -22 -10 37316K 16016K swread 0:06 0.00% 0.00% httpd 96069 nobody -22 -10 36860K 29684K swread 0:04 0.00% 0.00% httpd 95977 nobody -22 -10 34100K 6032K swread 0:02 0.00% 0.00% httpd 95995 nobody -22 -10 33740K 6692K swread 0:02 0.00% 0.00% httpd 96025 nobody -22 -10 32972K 2588K swread 0:02 0.00% 0.00% httpd 230 root 2 10 21324K 0K select 0:01 0.00% 0.00% <perl +5> 96082 nobody -22 -10 19172K 6828K swread 0:05 0.00% 0.00% httpd 96049 nobody -22 -10 18932K 7140K swread 0:03 0.50% 0.49% httpd 95962 nobody -22 -10 18144K 4012K swread 0:02 0.00% 0.00% httpd 95998 nobody -22 -10 17328K 4856K swread 0:01 0.00% 0.00% httpd 95994 nobody -22 -10 17284K 4764K swread 0:01 0.00% 0.00% httpd 96026 nobody -22 -10 17152K 5684K swread 0:01 0.00% 0.00% httpd 96050 nobody -22 -10 16908K 5540K swread 0:01 0.00% 0.00% httpd 96044 nobody -22 -10 14156K 7476K swread 0:03 0.00% 0.00% httpd 96054 nobody -22 -10 14024K 6920K swread 0:03 0.00% 0.00% httpd 95899 nobody -22 -10 13884K 6256K swread 0:04 0.00% 0.00% httpd 95825 nobody -22 -10 13152K 6884K swread 0:05 0.00% 0.00% httpd 95820 nobody -22 -10 12876K 5660K swread 0:04 0.00% 0.00% httpd 95960 nobody -22 -10 12840K 6312K swread 0:02 0.00% 0.00% httpd 95900 nobody -22 -10 12804K 4056K swread 0:02 0.00% 0.00% httpd 95898 nobody -22 -10 12624K 6084K swread 0:04 0.00% 0.00% httpd 95848 nobody -22 -10 12548K 6152K swread 0:03 0.00% 0.00% httpd 95971 nobody -22 -10 12516K 5124K swread 0:01 0.00% 0.00% httpd 96067 nobody -22 -10 12468K 4880K swread 0:02 0.00% 0.00% httpd 96029 nobody -22 -10 12444K 4384K swread 0:01 0.00% 0.00% httpd 96174 nobody 2 -10 12408K 11060K sbwait 0:02 3.13% 3.12% httpd 96066 nobody -22 -10 12392K 6036K swread 0:01 0.00% 0.00% httpd 95958 nobody -22 -10 12288K 5760K swread 0:01 0.00% 0.00% httpd 96080 nobody -22 -10 12284K 3476K swread 0:02 0.00% 0.00% httpd 95957 nobody -22 -10 12168K 5676K swread 0:02 0.00% 0.00% httpd 95972 nobody -22 -10 12132K 2976K swread 0:02 0.00% 0.00% httpd 95902 nobody -22 -10 12124K 4908K swread 0:02 0.00% 0.00% httpd 96125 nobody 2 -10 12084K 10408K sbwait 0:01 1.95% 1.95% httpd 96052 nobody -22 -10 12032K 7244K swread 0:01 0.88% 0.88% httpd 96036 nobody -22 -10 11900K 6604K swread 0:01 0.00% 0.00% httpd 96180 nobody 2 -10 11880K 10600K sbwait 0:01 1.82% 1.81% httpd 96020 nobody -22 -10 11804K 5168K swread 0:02 0.00% 0.00% httpd 95980 nobody -22 -10 11772K 6960K swread 0:02 0.00% 0.00% httpd 96074 nobody -22 -10 11772K 3264K swread 0:01 0.00% 0.00% httpd 96055 nobody -22 -10 11740K 6416K swread 0:01 0.00% 0.00% httpd 95997 nobody -22 -10 11720K 5636K swread 0:01 0.00% 0.00% httpd 95963 nobody -22 -10 11672K 4784K swread 0:02 0.00% 0.00% httpd 96022 nobody -22 -10 11652K 4888K swread 0:02 0.00% 0.00% httpd 95996 nobody -22 -10 11648K 4864K swread 0:02 0.00% 0.00% httpd 95959 nobody -22 -10 11620K 5252K swread 0:01 0.00% 0.00% httpd 95976 nobody 2 -10 11588K 5648K sbwait 0:01 0.00% 0.00% httpd 95975 nobody -22 -10 11404K 6024K swread 0:01 0.00% 0.00% httpd 95993 nobody -22 -10 11400K 4532K swread 0:01 0.00% 0.00% httpd 96048 nobody -22 -10 11340K 5908K swread 0:01 0.00% 0.00% httpd 95961 nobody -22 -10 11328K 3808K swread 0:01 0.00% 0.00% httpd 96051 nobody -22 -10 11252K 4712K swread 0:01 0.00% 0.00% httpd 96043 nobody 2 -10 11228K 0K RUN 0:01 0.00% 0.00% <http +d> 96024 nobody -22 -10 11216K 5176K swread 0:01 0.00% 0.00% httpd 96045 nobody -22 -10 11196K 5528K swread 0:01 0.00% 0.00% httpd 96121 nobody 2 -10 11192K 9752K sbwait 0:01 2.39% 2.39% httpd 96046 nobody -22 -10 11148K 5368K swread 0:01 0.00% 0.00% httpd 96038 nobody -22 -10 11128K 5812K swread 0:01 0.00% 0.00% httpd 96021 nobody -22 -10 11128K 3928K swread 0:01 0.00% 0.00% httpd 96028 nobody -22 -10 11100K 4908K swread 0:01 0.00% 0.00% httpd 96167 nobody 2 -10 11076K 9616K sbwait 0:01 1.17% 1.17% httpd 96019 nobody -22 -10 10992K 5520K swread 0:01 0.00% 0.00% httpd 96186 nobody -14 -10 10976K 9784K inode 0:01 0.74% 0.73% httpd 95973 nobody -22 -10 10972K 4960K swread 0:01 0.00% 0.00% httpd 96030 nobody -22 -10 10924K 3700K swread 0:01 0.00% 0.00% httpd 96027 nobody -22 -10 10868K 4656K swread 0:01 0.00% 0.00% httpd 96023 nobody -22 -10 10864K 5756K swread 0:01 0.00% 0.00% httpd 96183 nobody 2 -10 10592K 9436K sbwait 0:01 0.39% 0.39% httpd 96179 nobody 2 -10 10512K 9152K sbwait 0:01 0.64% 0.63% httpd 96116 nobody 2 -10 10320K 8952K sbwait 0:01 0.98% 0.98% httpd 96117 nobody -18 -10 9632K 8320K spread 0:00 0.00% 0.00% httpd 96037 nobody -22 -10 9436K 4392K swread 0:00 0.00% 0.00% httpd 95974 nobody -22 -10 9424K 2624K swread 0:00 0.00% 0.00% httpd 96077 nobody -22 -10 9160K 4804K swread 0:00 0.00% 0.00% httpd 96187 nobody 2 -10 9080K 7668K sbwait 0:01 0.74% 0.73% httpd 96071 nobody -22 -10 8752K 4452K swread 0:00 0.00% 0.00% httpd 96182 nobody 2 -10 8716K 0K RUN 0:00 0.00% 0.00% <http +d> 96047 nobody -22 -10 8668K 1520K swread 0:00 0.00% 0.00% httpd 96078 nobody -22 -10 8616K 3724K swread 0:00 0.00% 0.00% httpd 96184 nobody 2 -10 8556K 0K RUN 0:00 0.00% 0.00% <http +d> 96053 nobody -22 -10 8396K 2424K swread 0:00 0.00% 0.00% httpd 96185 nobody 2 -10 8396K 0K RUN 0:00 0.00% 0.00% <http +d> 96157 nobody -22 -10 8376K 6284K swread 0:00 0.00% 0.00% httpd 96076 nobody -22 -10 8356K 3112K swread 0:00 0.00% 0.00% httpd 96065 nobody -22 -10 8048K 2884K swread 0:00 0.00% 0.00% httpd 96175 nobody 2 -10 7988K 0K RUN 0:00 0.00% 0.00% <http +d> 96159 nobody -22 -10 7596K 5240K swread 0:00 0.00% 0.00% httpd 96161 nobody -22 -10 7572K 5068K swread 0:00 0.00% 0.00% httpd 96122 nobody 2 -10 7536K 0K RUN 0:00 0.00% 0.00% <http +d> 96129 nobody 2 -10 7536K 0K RUN 0:00 0.00% 0.00% <http +d> 96160 nobody -22 -10 7532K 5068K swread 0:00 0.00% 0.00% httpd 96162 nobody -22 -10 7532K 5068K swread 0:00 0.00% 0.00% httpd 96134 nobody -22 -10 7532K 3240K swread 0:00 0.00% 0.00% httpd 96171 nobody 2 -10 7532K 0K RUN 0:00 0.00% 0.00% <http +d> 96126 nobody 2 -10 7532K 0K RUN 0:00 0.00% 0.00% <http +d> 96166 nobody 2 -10 7532K 0K RUN 0:00 0.00% 0.00% <http +d> 96163 nobody -22 -10 7528K 5100K swread 0:00 0.00% 0.00% httpd 96109 nobody -22 -10 7528K 3192K swread 0:00 0.00% 0.00% httpd 96105 nobody -22 -10 7528K 3188K swread 0:00 0.00% 0.00% httpd 96096 nobody -22 -10 7528K 3120K swread 0:00 0.00% 0.00% httpd 96132 nobody -22 -10 7528K 3116K swread 0:00 0.00% 0.00% httpd 96095 nobody -22 -10 7528K 3044K swread 0:00 0.00% 0.00% httpd 96133 nobody -22 -10 7528K 3040K swread 0:00 0.00% 0.00% httpd 96107 nobody -22 -10 7528K 2976K swread 0:00 0.00% 0.00% httpd 96114 nobody -22 -10 7528K 2720K swread 0:00 0.00% 0.00% httpd 96123 nobody -22 -10 7528K 2688K swread 0:00 0.00% 0.00% httpd 96079 nobody -22 -10 7528K 2588K swread 0:00 0.00% 0.00% httpd 96087 nobody -22 -10 7524K 3296K swread 0:00 0.00% 0.00% httpd 96128 nobody 2 -10 7524K 0K RUN 0:00 0.00% 0.00% <http +d> 96124 nobody 2 -10 7516K 0K RUN 0:00 0.00% 0.00% <http +d> 96176 nobody 2 -10 7516K 0K RUN 0:00 0.00% 0.00% <http +d> 96131 nobody -22 -10 7512K 4800K swread 0:00 0.00% 0.00% httpd 96106 nobody -22 -10 7512K 3116K swread 0:00 0.00% 0.00% httpd 96112 nobody -22 -10 7512K 2760K swread 0:00 0.00% 0.00% httpd 96130 nobody -18 -10 7508K 4932K spread 0:00 0.00% 0.00% httpd 96094 nobody -22 -10 7508K 3164K swread 0:00 0.00% 0.00% httpd 96083 nobody -22 -10 7508K 3144K swread 0:00 0.00% 0.00% httpd 96118 nobody -22 -10 7508K 2812K swread 0:00 0.00% 0.00% httpd 96113 nobody -22 -10 7504K 2780K swread 0:00 0.00% 0.00% httpd 96101 nobody -22 -10 7496K 3212K swread 0:00 0.00% 0.00% httpd 96102 nobody -22 -10 7496K 3180K swread 0:00 0.00% 0.00% httpd 96081 nobody -22 -10 7496K 3076K swread 0:00 0.00% 0.00% httpd 96111 nobody -22 -10 7496K 2712K swread 0:00 0.00% 0.00% httpd 96127 nobody 2 -10 7496K 0K RUN 0:00 0.00% 0.00% <http +d> 96164 nobody 2 -10 7496K 0K RUN 0:00 0.00% 0.00% <http +d> 96165 nobody 2 -10 7496K 0K RUN 0:00 0.00% 0.00% <http +d> 96158 nobody 2 -10 7496K 0K RUN 0:00 0.00% 0.00% <http +d> 96169 nobody 2 -10 7496K 0K RUN 0:00 0.00% 0.00% <http +d> 96170 nobody 2 -10 7496K 0K RUN 0:00 0.00% 0.00% <http +d> 96168 nobody 2 -10 7496K 0K RUN 0:00 0.00% 0.00% <http +d> 96120 nobody -22 -10 7492K 3064K swread 0:00 0.00% 0.00% httpd 96115 nobody -22 -10 7492K 2720K swread 0:00 0.00% 0.00% httpd 96103 nobody -22 -10 7112K 2308K swread 0:00 0.00% 0.00% httpd 183 root 10 10 5532K 0K RUN 0:23 0.00% 0.00% <perl +> 261 root 2 -15 5476K 1004K sbwait 3:31 0.00% 0.00% perl 96190 nobody -18 -10 5100K 3656K spread 0:00 0.00% 0.00% httpd 96189 nobody -14 -10 5020K 3692K inode 0:00 0.00% 0.00% httpd 96192 nobody -6 -10 4968K 3536K biord 0:00 0.00% 0.00% httpd 96195 nobody -14 -10 4888K 3600K inode 0:00 0.00% 0.00% httpd 96193 nobody -6 -10 4888K 3596K biord 0:00 0.00% 0.00% httpd 96191 nobody -14 -10 4888K 3596K inode 0:00 0.00% 0.00% httpd 96119 nobody -14 -10 4888K 3372K inode 0:00 0.00% 0.00% httpd 153 root 2 -10 4300K 1964K select 0:20 0.00% 0.00% httpd 96188 nobody -22 -10 4300K 648K swread 0:00 0.00% 0.00% httpd 96194 nobody 2 -10 4300K 0K RUN 0:00 0.00% 0.00% <http +d> 96198 nobody 2 -10 4300K 0K RUN 0:00 0.00% 0.00% <http +d> 96196 nobody 2 -10 4300K 0K RUN 0:00 0.00% 0.00% <http +d> 96197 nobody 2 -10 4300K 0K RUN 0:00 0.00% 0.00% <http +d> 211 root -22 -20 3412K 784K swread 3:17 0.00% 0.00% perl 366 root 2 4 3112K 0K poll 0:00 0.00% 0.00% <stun +nel-pops> 96216 root 36 4 3044K 716K RUN 0:00 0.00% 0.00% perl 273 root -6 4 3044K 532K piperd 0:01 0.00% 0.00% perl 270 root 36 4 3044K 432K RUN 0:01 0.00% 0.00% perl 274 root 36 4 3044K 412K RUN 0:07 0.00% 0.00% perl 268 root 10 4 3044K 0K nanslp 0:01 0.00% 0.00% <perl +> 234 root 2 0 2332K 0K select 0:06 0.00% 0.00% <sshd +> 191 root 10 4 2280K 0K RUN 0:13 0.00% 0.00% <ncft +pd> 77300 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90242 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 48258 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 90215 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 75810 root 2 4 2280K 0K accept 0:00 0.00% 0.00% <ncft +pd> 95759 monkads 30 0 2196K 532K RUN 0:03 1.84% 0.83% top 239 root 10 0 2088K 0K wait 0:00 0.00% 0.00% <perl +> 262 root 2 0 1356K 0K select 0:01 0.00% 0.00% <xine +td> 96209 root -18 0 1312K 548K spread 0:00 0.00% 0.00% perl 201 root -6 1 1264K 156K piperd 0:00 0.00% 0.00% ncftp +d 96211 root -14 0 1212K 536K inode 0:00 0.00% 0.00% perl 95 root 2 0 1056K 0K poll 0:06 0.00% 0.00% <sysl +og-ng> 197 root 2 6 1016K 0K accept 0:00 0.00% 0.00% <popa +3d> 96206 root -6 0 1008K 488K piperd 0:00 0.00% 0.00% cron 96039 root -6 0 1008K 232K piperd 0:00 0.00% 0.00% cron 95749 root -6 0 1008K 164K piperd 0:00 0.00% 0.00% cron 96072 root -6 0 1008K 80K piperd 0:00 0.00% 0.00% cron 102 root 10 0 992K 0K nanslp 0:03 0.00% 0.00% <cron +> 96104 root 10 0 992K 0K RUN 0:00 0.00% 0.00% <cron +> 96140 root 10 0 992K 0K RUN 0:00 0.00% 0.00% <cron +>

Update: Looking at the http access_log for around the time that the problem appears to start has not revealed any "smoking gun" evil URLs that somehow cause the receiving httpd to become a fork bomb, but that hay stack is rather large and the data recorded isn't ideal for finding such things. A more Everything-aware log of accesses is on my to-do list...

- tye        

Replies are listed 'Best First'.
Re: 'A' web server takes another "time out"
by samtregar (Abbot) on May 03, 2006 at 18:58 UTC
    I don't have any guesses as to what's causing so many httpds, but perhaps you can fix it by changing your Apache configuration? It seems like a well-considered MaxClients could prevent this kind of explosion.

    -sam

      Thanks for the pointer.

      It looks like I'd need read access to /usr/pair/apache/... in order to check that but I don't have it. An older copy of httpd.conf that I requested (before the upgrade) had MaxClients set to 100. I'll have to ask for a new copy...

      - tye        

        If PerlMonks is running under mod_perl you should able to use the Apache API to examine the current setting. You might even be able to dynamically change it!

        -sam

      On my experience, is very important correct values for:
      • MaxClient -> 100 Ok (it's can be more)
      • MinSpareServers -> min free instances, sugest same of StartServers
      • MaxSpareServers -> max free instances, sugest more than 60% of MaxClients
      • MaxKeepAliveRequests -> never all instances, sugest 50% of MaxClient
      • MaxRequestPerChild -> max request before kill process, set if required...

      Some time ago I had make a node Monitor instances of Apache Web server, with a script to see how are use of apache web connections online. To see historical usage, I'm sugest to use it or Apache-Tools (from Apache-Security).

      Evaluating your load averages, swap and CPU states, on my opinion optimize apache make good results... See your running time of httpd process:

      $ grep httpd 547234 | awk '{print $8}' | sort | uniq -c | head -n 5 125 0:00 110 0:01 41 0:02 14 0:03 9 0:04
      But the great info is "Parent Server Generation: XX" on server-status ... You realy need to enable this module ;)

      Current Time: Friday, 05-May-2006 10:56:47 PDT Restart Time: Tuesday, 02-May-2006 10:24:02 PDT Parent Server Generation: 3 Server uptime: 3 days 32 minutes 45 seconds Total accesses: 16557075 - Total Traffic: 349.2 GB CPU Usage: u170.547 s310.375 cu0 cs0 - .184% CPU load 63.4 requests/sec - 1.4 MB/second - 22.1 kB/request 175 requests currently being processed, 81 idle workers CKWWCKK_K_K_CKC_KK_K___KKK__K_KKKKKKKC_K_CC_KWK__WKKK_K_WKKK__WK GGG.GG.G.GGG...GGGGG..GGG.GGG.GG..GG.G..GG..G.....GWGGGGGGGGGGGG .G...W.GG.....G.GG.G..G.GG.........GGWG.G..G.G.....G...WG....G.G _K__KKCKCKWCK_WK__KKK_K_KW_KC__W___KKKK_KKKCK_KKWKKC_KCKKWKCKKWC

      --
      Marco Antonio
      Rio-PM

        MaxClients of 100 seems pretty high to me. Commodity hardward isn't going to deal with 100 simultaneous mod_perl jobs very well! Even if you have the memory to handle that many jobs, you probably don't have the CPU.

        MaxClients can be high on a front-end server which serves static content and does a reverse proxy to the mod_perl backend. Those servers do much less work per-request and a given machine can run more of them simultaneously.

        -sam

Re: 'A' web server takes another "time out"
by jonadab (Parson) on May 03, 2006 at 20:26 UTC
    See lots of 'httpd' processes appear

    I'm not sure what the usual ratio is between available system resources and system resources needed to keep up with Perlmonks requests, but _if_ the ratio were to dip below a certain critical point, then the number of new processes would grow faster than the old processes could finish. If that were the case, the total number of running processes could be expected to increase steadily, further dividing the system resources (notably RAM) available to each, in a vicious cycle, which would explain the extremeness of the symptoms you describe.

    However, that leaves open the question of what happens to trigger the event in the first place. If the available system resources were just barely adequate for handling normal (or normal peak) traffic, then a slightly-more-than-normal traffic spike could trigger it, but it seems like if the system were that close to maxed out all the time you'd probably already know it. Are there things users can do that cause substantially more activity on the server than a normal request? Too many Super Search queries at once, perhaps, or something along those lines?

    but 'top' isn't particularly flexible but is still the best tool I've found available on this system so far

    My immediate thought here is to look for process-related stuff on the CPAN, looking for something that doesn't just shell out to ps, preferably something Unix-oriented and written in pure Perl. I don't have much experience working with process tables, though, beyond what can be done with ps and top. update: My second thought is that I'm sure you're already aware some versions of top can show considerably more columns than they do show by default. The version I have here (on FreeBSD) is quite impoverished, but ISTR that the version of top that I used on Mandrake 9 had rather a lot of optional columns and a loose marble rolling around in the back of my head suggests it _may_ (it's been several months...) have had an option for showing the parent process. I mention this only on the off chance that you haven't already checked for it. Hit ? in top to see a list.


    Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.

      I'd be a bit disappointed if a mature system like FreeBSD contained this feedback loop in resource allocation. No system is perfect, but I'd come to expect better behavior when memory becomes scarce than such a feedback loop that makes the problem keep getting worse while trying to let each part continue to fight to do its thing such that nothing at all can get done and it takes so long before the system finally gives up and reboots (or is it that the system never gives up and pair.com notices the lock-up and eventually cycles power?). I recall much older systems noticing a problem and selecting processes to be completely "swapped out" (different from "paging", a more accurate term for what is often mislabeled "swapping") such that they stop fighting and other luckier processes get a chance to finish such that the resource exhaustion might pass or at least the system is capable of getting something done such that someone can "get in" in order to clean up "by hand". Note that when this happens to the 'A' web server, there is no hope of logging in to the system.

      But perhaps this is just a case of bad tuning such that Apache fights too hard and it takes a while for FreeBSD to overcome it... Perhaps that is why many processes go to "0K" resident memory usage, though I'd expect a state much different than "RUN" to be reported for a swapped-out process. This lead me to notice again the angle brackets such as on "<httpd>" and searching "man top" for what those mean I find "COMMAND is the name of the command that the process is currently running (if the process is swapped out, this column is marked '<swapped>')" which isn't completely clear but somewhat supports that interpretation.

      Since I don't have root access, I don't think trying to roll my own replacement for 'top' or 'sar' will be possible. At least, my assumption was that I'd not have access to what 'top' and 'ps' use to get all of that information about other processes. Indeed, I don't have any access to /proc (symlink to /root/proc and I have no access to even /root). But I see that neither 'top' nor 'ps' are set-UID nor set-GID so I'm not sure how the security is arranged. 'man ps' mentions needing procfs mounted (and referencing /proc and /dev/kmem). So would a self-built 'top' on an unprivileged FreeBSD account be useful? If not, I think just adding "ps" output to the existing "top" output would be one of the next steps.

      - tye        

        I don't have any access to /proc [...] 'man ps' mentions needing procfs mounted [...] I think just adding "ps" output to the existing "top" output would be one of the next steps.

        If top took 5.5 minutes in showing output between two given snapshots above, I think adding ps won't improve the situation because ps data won't be correlated at all with top's. My bet would be to play with ps o argument, which allow you to get the information of top and more. Setting PERSONALITY to "bsd" on this Linux machine allows me to run ps as I were on a FreeBSD. I hope...

        $ PERSONALITY=bsd ps faxo pid,euid,egid,ni:2,vsz:6,rss:6,pcpu,pmem,sta +t:3=ST,tname:6,stime,bsdtime,args PID EUID EGID NI VSZ RSS %CPU %MEM ST TTY STIME TIME C +OMMAND 1 0 0 0 1924 652 0.0 0.0 S ? 19:24 0:00 i +nit [2] 2 0 0 19 0 0 0.0 0.0 SN ? 19:24 0:00 [ +ksoftirqd/0] 3 0 0 -5 0 0 0.0 0.0 S< ? 19:24 0:00 [ +events/0] [...] 1368 111 111 0 26580 912 0.0 0.0 Ssl ? 19:26 0:00 / +usr/sbin/ippl -c /var/run/ippl/ippl.conf 1423 0 0 0 4800 1608 0.0 0.1 Ss ? 19:26 0:00 / +usr/lib/postfix/master 1428 101 104 0 4812 1604 0.0 0.1 S ? 19:26 0:00 +\_ pickup -l -t fifo -u -c

        You can s/args$/comm/ in order not to show parameters of commands:

        1368 111 111 0 26580 912 0.0 0.0 Ssl ? 19:26 0:00 i +ppl 1423 0 0 0 4800 1608 0.0 0.1 Ss ? 19:26 0:00 m +aster 1428 101 104 0 4812 1604 0.0 0.1 S ? 19:26 0:00 +\_ pickup

        HTH.

        --
        David Serrano

        I'd be a bit disappointed if a mature system like FreeBSD contained this feedback loop in resource allocation

        Oh, is the perlmonks server running FreeBSD? I didn't realize. In that case, top doesn't appear to show parent process IDs, unless I'm missing something. There are things I like about FreeBSD, but its version of top is not one of them. The ps that comes with FreeBSD is rather better, but in a scenario where you can't start a new process, top could be already running, and I don't know of a way to make ps do that (i.e., be already running and report output periodically).

        I recall much older systems noticing a problem and selecting processes to be completely "swapped out"

        I've observed on my desktop that FreeBSD will kill a process if it consumes too much RAM (in situations where Linux wouldn't, although Linux since circa 2.2 will also do this if the entire system is low on RAM, which is better than the Linux 2.0 behavior; but FreeBSD will kill a process for this even when there's unused swap space, if it surpasses some per-process memory usage quota). However, one process using lots of RAM is a very different scenario from many processes being spawned. I don't know what FreeBSD does with that. I could test that here with a forkbomb, I suppose...

        Indeed, I don't have any access to /proc

        That could make it hard to get a good look at the process tree.

        So would a self-built 'top' on an unprivileged FreeBSD account be useful?

        I don't know. It also seems like there _ought_ to be a tool designed to prepare a process ahead of time (preload it into RAM , go ahead and ask the operating system for a process table entry, and so forth) to be launched quickly, which might allow you to set up ps to run and then, when the problem is noticed, trigger it to go ahead. I do not, however, actually know of such a utility.

        I feel your pain. Having to work around the lack of root access to accomplish things that would be much easier _with_ root access is certainly something that can be annoying. (I can also understand why the hosting company doesn't want to hand out root access, of course, but that doesn't make your situation any less frustrating.)


        Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.
Re: 'A' web server takes another "time out"
by m.att (Pilgrim) on May 03, 2006 at 23:55 UTC
    If you're capturing regular sar data, (With the sa1/sa2 scripts) this could provide a lot of useful information beyond what top provides. (You can profile the performance on a system quite extensively with good sar output). It would be helpful if you could make the sar data from the last week or so available for download. (If available)

    The files are usually located in /var/adm/sa and should be readable from userland. I've found that FreeBSD or Linux boxes don't usually have sar enabled, (unlike a lot of commercial *NIXes) but it's worth a shot. Just tar 'em up and put them somewhere for download.

    If the data is indeed available but you don't feel comfortable sharing it, there are some utilities available to analyse the data directly, such as:

    Sadly it requires a commercial license and I can't think of any cost-free alternatives. Readers please chime in if you know of any similar analysis utilities.

    Hoping to help,

    m.att

      Yes, 'sar' was what I first reached for, realizing that it is far better to compactly collect all of the performance data so that after the fact you can view slices of it this way and that to try to figure out what the matter is...

      $ sar ksh: sar: not found $ ls -l /var/adm ls: adm: Permission denied $ ls -ld /var/adm drwxr-x--- 3 root wheel 512 Jan 22 2001 /var/adm/ $

      And I'm certainly not 'root' nor in 'wheel'. (:

      - tye        

        Well, that's a bust.. too bad.

        How about capturing some regular snapshots with vmstat? Maybe

        vmstat 60

        and a

        vmstat -d 60

        piped to a file for a few days (or at least a good bit of time before, during and after the event in question). (These commands may require different syntax if you're on FreeBSD, which I can't test with -- we're basically looking for VM stuff and IO/disk stuff... also see iostat) Maybe also throw in a few vmstat -s's for good measure. This would at least provide a little bit more detail around swap in/out and IO.

        m.att

Re: 'A' web server takes another "time out"
by eric256 (Parson) on May 03, 2006 at 21:50 UTC

    Is it simply possible that someone has some sort of scheduled DOS attack? I know its a stupid obvious question, buts its the first thing that comes to mind and maybe no one asked simply because it was so obvious. I do wonder why MaxClients isn't set low enough to stop this from happening though.


    ___________
    Eric Hodges
Re: 'A' web server takes another "time out"
by spiritway (Vicar) on May 04, 2006 at 03:10 UTC

    You might have a look at RLimitNPROC, as well. According to the Apache documentation, "Limits the number of processes that can be launched by processes launched by Apache children." It would be nice if you could get your hands on the httpd.conf file...

Re: 'A' web server takes another "time out"
by Ultra (Hermit) on May 04, 2006 at 11:56 UTC

    I guess you (or Pair in case you don't have access to access_log) should do some statistics to see if there's a significant hits/second ratio difference when everything is OK and when the forking occurs.

    Another point to check is whether the kernel version you are using has bugs concerning swap allocation.

    Also, while this wouldn't help to determine the exact nature of the problem, maybe it can help to avoid DoS - mod_evasive

    Of course, Pair should agree to install/use it ;)

    Dodge This!
Re: 'A' web server takes another "time out"
by wazoox (Prior) on May 04, 2006 at 15:50 UTC
    The system load is very high but the CPU 40% idle; I've often seen this in I/O bound situtations. Is it possibly that the system disk (especially the swap, or database disk) is anormally slow ? Perhaps the DMA isn't working properly ?

      One of my prior working theories was a disk "going bad" (having much experience with the fact that the manufacturors of commonly-used disk drives, drivers, and controllers only took away half of the point of "fault tolerance"1 resulting in drives "going bad" extremely silently, the only "evidence" being a particular pattern of slow-down).

      But that was when I didn't see good evidence of lots of swapping going on. Of course, "lots" is a relative term so, anyone, please feel free to make some calculations of disk speed based on the amount of swapping reported above and let us know if, in order to explain the CPU idleness, we'd need to have an unusually slow disk in the mix as well.

      There is no database disk on this system.

      - tye        

      1 The major point of the fault tolerance movement was to prevent things from suddenly failing. The point was that you could spend more and more resources making things more and more reliable, probably reducing how frequently something just "falls down" but you'd still end up having things suddenly fail, likely at a very inconvient time and have to spend a lot of down time and running around in a panic trying to replace / repair what failed. A "better way" was seen: Don't have single points of failure so that when something fails, things can continue on and you can schedule to replace the failed part at a convenient time, perhaps without even requiring down time. And the key to this working is that someone must be notified that a failure happened! Unfortunately, so many common modern systems include features that are tolerant of faults but provide no means of notification and often even prevent you from ever being able to tell, no matter how hard you look, that a fault happened. Hard disks are a great example of this, in my experience.

      It used to be that a hard disk going bad would start recording faults in your syslog and the frequency of these reports would rise, very slowly at first but following a geometric curve, and you'd replace the disk before it catastrophically died. Now most disks start to fail by slowing down from time-to-time, more and more dramatically, eventually nearly locking up while the disk retries reading the sector that is going bad but eventually fails, then the driver/controller retries which causes the disk to do a whole nother round of retries, then the operating system multiplies the number of retries yet again with its own retries... and eventually we just get lucky and the CRC "passes" and no hard evidence that anything at all went wrong remains.

      I'd point you to a google search for the "S.M.A.R.T." acronym but google no longer treats searching for "s m a r t" differently from searching for "s-m-a-r-t" and so you'd just get a huge list of pages containing the word "smart". That system lets you query some internal counters kept nearly hidden inside the disk drive that likely includes a count of at least some types of retries. It is the only way I've been able to find any real evidence (usually still quite vague) that a disk is starting to fail. But note that most S.M.A.R.T. tools try to be "smart" and just figure out for you whether or not the disk is about to suddenly fail (making nearly the identical mistake mentioned above) and thus usually don't tell you a single thing until the disk is within minutes of failing (usually while you aren't using the computer, and often only after the failure has already become catastrophic). So you have to jump through hoops to look at the raw S.M.A.R.T. data and make guesses at what some of them mean... Which has a lot to do with why you've probably not heard of S.M.A.R.T. before (or only heard bad things about it).

      And then there is the other extreme: parity checking of memory. When your memory is working just fine 99.999% of the time but a single bit error is noticed and reported to you by virtue of the fact that your entire computer system has suddenly become a frozen brick displaying the notification on the console. Being blissfully unaware of the rare single-bit error starts to look good when compared to having all of the in-progress work, most (probably all) of which would be unaffected by that one bit, being sent to evaporate for the sake of providing notification of a fault...

      Yes, I understand that the plumbing of notifications is hard and that is why this plumbing of notifications is so often not done or is done so badly.

Re: 'A' web server takes another "time out"
by ambrus (Abbot) on May 20, 2006 at 17:48 UTC

    I belive the two webservers are 209.197.123.153 and 66.39.54.27, right? But which one of them is called 'A'? Or is this some other distinction?

    Update 2007 feb 27: tye said in the chatterbox that "the IPs are also in order for 'a' vs 'b'" so 66.39.54.27 is the A webserver and 209.197.123.153 is B.

Re: 'A' web server takes another "time out"
by starbolin (Hermit) on May 24, 2006 at 16:45 UTC

    Could you please post the output from:  uname -a

    There is a lot here that is not stock FreeBSD behavior. Looking at the code for top I cannot see where it would insert brackets around the process name. Top simply reports what it finds in the process record; so I can only assume Apache puts the brackets there.

    On a similar vein man top asserts that swapped process are marked as <swapped> but I don't know how it can assert this as this is OS dependant behavior.

    The FreeBSD virtual memory manager usually operates very transparently. I wrote a perl program to hog all the memory then ran several instances. I saw none of the behavior exhibited in your listing. The swapping was transparent. The individual process lines showed normal running with full memory allocation. Only the swap statistics showing the the swap being used. When I ran out of swap the process was killed. I am going to play with this more. It seems that Apache does it's own vm. I can't confirm this.



    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
      Could you please post the output from: uname -a
      FreeBSD $FQDN 4.8-STABLE FreeBSD 4.8-STABLE #0: Fri Apr 15 13:34:52 ED +T 2005 $USER@$HOST:/usr/src/sys/compile/PAIRqsv i386

      with 3 items replaced by Perl scalars for privacy reasons.

      Looking at the code for top I cannot see where it would insert brackets around the process name. Top simply reports what it finds in the process record; so I can only assume Apache puts the brackets there.

      Thanks for diving into the code. But I think your assumption above is less likely than my stated guess, and I think you even provide more evidence:

      On a similar vein man top asserts that swapped process are marked as <swapped> but I don't know how it can assert this as this is OS dependant behavior.

      So it could certainly be the case that the OS, instead of replacing the program name with the literal string "<swapped>", it puts angle brackets around the program name. This makes even more sense as a literal "<swapped>" would leave you wondering what the heck got swapped out on you.

      The FreeBSD virtual memory manager usually operates very transparently. I wrote a perl program to hog all the memory then ran several instances. I saw none of the behavior exhibited in your listing. The swapping was transparent. The individual process lines showed normal running with full memory allocation. Only the swap statistics showing the the swap being used. When I ran out of swap the process was killed.

      You were only hogging swap space. That causes much different problems than hogging real memory. Note that "the process was killed" has a body buried in it, as one can't blame a specific process for exhausting the swap space and so, on a good operating system, heuristics are involved (on a bad operating system, the process unlucky enough to be the first to try to grab more space after none is available gets killed -- early Ultrix comes to mind).

      In order to hog real memory, you have to keep using the pages of memory that you've allocated. See (tye)Re: one-liner hogs; the one labeled "Memory" just tests allocating lots of virtual memory, that is, tests using a lot of swap space. The one labeled "Swap" will cause a lot of swapping (more accurately, "paging" though swapping out would likely eventually happen if you ran enough of them) because it tries to use lots of real memory. (So, yes, the labels are backward, depending on how you look at it.)

      It seems that Apache does it's own vm. I can't confirm this.

      I won't say that I know for sure that Apache does not, but I'd bet real money on it.

      - tye        

Re: 'A' web server takes another "time out"
by rootcho (Pilgrim) on Jun 14, 2007 at 23:59 UTC
    You could try to use "atop" if you can.
    It can also redirect the monitoring to file afaik.
    http://www.atconsultancy.nl/atop/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://547234]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2019-11-14 21:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (80 votes). Check out past polls.

    Notices?