Any idea why the server stop responding like this?
This has happened several times this week and only calling to the DC and reboot the server can bring things resolved.Cron is not running, console is not recording logs, and nothing is responding to the world. I would say the server is hanged or crushed for specific reasons.
I've written a shell scripts to run every 10 seconds and hope for the best that it would give me some clues at the next time when it crashes again.
Code:
#!/bin/sh
if ping -nc 1 -w 1 localhost 2>%1 > /dev/null
then
# localhost is up
if ping -nc 1 -w 1 gateway_ip 2>%1 > /dev/null
then
# gateway is up
echo `date` "ok"
exit
fi
fi
# handle the trouble
# show siystem stat
echo `date` "failure(s) detected"
ping -c 5 -w 5 localhost
ping -c 5 -w 5 gateway_ip
ifconfig
df -k
vmstat 1 5
netstat
Sat Apr 19 07:17:01 CDT 2003 ok
Sat Apr 19 07:18:01 CDT 2003 ok
Sat Apr 19 07:19:00 CDT 2003 ok
Sat Apr 19 09:14:55 CDT 2003 ok
Sat Apr 19 09:16:00 CDT 2003 ok
Sat Apr 19 09:17:00 CDT 2003 ok
Sat Apr 19 07:18:01 CDT 2003 ok
Sat Apr 19 07:19:00 CDT 2003 ok
Sat Apr 19 09:14:55 CDT 2003 ok
Sat Apr 19 09:16:00 CDT 2003 ok
Sat Apr 19 09:17:00 CDT 2003 ok
Any ideas except for broken RAM that crashes the virtual shared memory.

