Tracking down a bottleneck that is killing my business

Would like to ask for suggestions on tracking down a bottleneck that has crippled my system for about a month now. It's the strangest problem I have ever dealt with and it's cost me a small fortune.

THE PROBLEM



THE SYSTEM



THE SITE



CURRENT GRAPHS




TOP

SERVER 1

Code:
15:06:28  up 8 days, 21:59,  1 user,  load average: 0.06, 0.06, 0.01
252 processes: 251 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    1.6%    2.8%    0.8%   0.0%     0.8%    0.0%  393.2%
           cpu00    0.0%    0.0%    0.0%   0.0%     0.9%    0.0%   99.0%
           cpu01    1.9%    1.9%    0.9%   0.0%     0.0%    0.0%   95.1%
           cpu02    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
           cpu03    0.0%    0.9%    0.0%   0.0%     0.0%    0.0%   99.0%
Mem:  2055364k av, 2032760k used,   22604k free,       0k shrd,  124140k buff
                   1430188k actv,  268556k in_d,   33936k in_c
Swap: 2040212k av,  341328k used, 1698884k free                 1216112k cached

SERVER 2

Code:
15:04:45  up 8 days, 22:05,  2 users,  load average: 0.04, 0.05, 0.06
249 processes: 248 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0.8%    0.8%    1.6%   0.0%     0.8%   20.0%  374.8%
           cpu00    0.0%    0.0%    0.0%   0.0%     0.9%    5.7%   93.2%
           cpu01    0.0%    0.9%    0.0%   0.0%     0.0%    3.8%   95.1%
           cpu02    0.0%    0.0%    0.0%   0.0%     0.0%    5.8%   94.1%
           cpu03    0.9%    0.0%    1.9%   0.0%     0.0%    4.8%   92.3%
Mem:  1025308k av, 1002940k used,   22368k free,       0k shrd,   68164k buff
                    647336k actv,  119312k in_d,   14824k in_c
Swap: 1052248k av,  230396k used,  821852k free                  442612k cached

APACHE STATUS

Code:
Current Time: Wednesday, 02-Feb-2005 15:09:26 EST
Restart Time: Tuesday, 01-Feb-2005 23:24:58 EST
Parent Server Generation: 0 
Server uptime: 15 hours 44 minutes 28 seconds
Total accesses: 490366 - Total Traffic: 18.7 GB
CPU Usage: u3590.36 s419.53 cu23.09 cs1.5 - 7.12% CPU load
8.65 requests/sec - 345.9 kB/second - 40.0 kB/request
108 requests currently being processed, 63 idle servers

 

 

 

 

Top