CPU Usage Problem - PLEASE HELP!!
I have the following spec server:Dual Pentium III 1.2ghz processors
RAID5 - comprised of 4 36gb SCSI disks
1.5gb of memory
Red Hat Linux 7.1
WHM/Cpanel
I have had the following recurring problem for the last 4 weeks. Everything is working normally until suddenly the server becomes very slow. This gets worse and worse over a period of minutes until the server is no longer reachable via the network or keyboard input. I get normal readings from the "w" command:
3:53pm up 1:39, 1 user, load average: 0.11, 0.09, 0.02
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 whereever 3:47pm 0.00s 0.05s 0.01s w
and my memory:
total used free shared buffers cached
Mem: 1543892 395848 1148044 0 45552 238448
-/+ buffers/cache: 111848 1432044
Swap: 265032 0 265032
and df:
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/rd/c0d0p1 2071384 575076 1391084 30% /
/dev/rd/c0d0p7 102032604 35914868 60934768 38% /home
/dev/rd/c0d0p5 1548096 808520 660860 56% /usr
However "top" shows something like this (I have simulated it here)
3:54pm up 1:41, 1 user, load average: 0.06, 0.08, 0.02
94 processes: 92 sleeping, 1 running, 1 zombie, 0 stopped
CPU0 states: 0.0% user, 0.1% system, 0.0% nice, 99.0% idle
CPU1 states: 0.0% user, 100.0% system, 0.0% nice, 0.0% idle
Mem: 1543892K av, 398744K used, 1145148K free, 0K shrd, 45572K buff
Swap: 265032K av, 0K used, 265032K free 239060K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
9249 root 17 0 1040 1040 812 R 0.9 0.0 0:00 top
1 root 0 0 516 516 448 S 0.0 0.0 0:04 init
2 root 8 0 0 0 0 SW 0.0 0.0 0:00 keventd
3 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
(and many more processes - but nothing CPU-intensive at all). The thing that is causing the problem is on "CPU1" that the system is using 100% of that processor. But top shows no such CPU intensive processes at all. If I manage to catch the server quickly I can do a reboot and it comes back up fine without problem, until it happens at another random period (a few days) later. I am not running any type of automatic backup or anything that would suddenly require massive CPU usage.
Any help as to how I can stop this happening? I've recently had to do a clean OS install (due to a RAID controller error and also OS corruption) but this has not sorted it out, so I do not believe it is a hacker or an OS problem. I can't see how it could be a user's script because anything like that would show up a process.
Please help!!