Any idea why the server stop responding like this?

This has happened several times this week and only calling to the DC and reboot the server can bring things resolved.

Cron is not running, console is not recording logs, and nothing is responding to the world. I would say the server is hanged or crushed for specific reasons.

I've written a shell scripts to run every 10 seconds and hope for the best that it would give me some clues at the next time when it crashes again.

Code:
#!/bin/sh

if ping -nc 1 -w 1 localhost 2>%1 > /dev/null
then
        # localhost is up
        if ping -nc 1 -w 1 gateway_ip 2>%1 > /dev/null
        then
        # gateway is up
        echo `date` "ok"
        exit
        fi
fi

# handle the trouble
# show siystem stat
echo `date` "failure(s) detected"
ping -c 5 -w 5 localhost
ping -c 5 -w 5 gateway_ip
ifconfig
df -k
vmstat 1 5
netstat
The server when down around 7.00AM, here's the logs:

Sat Apr 19 07:17:01 CDT 2003 ok
Sat Apr 19 07:18:01 CDT 2003 ok
Sat Apr 19 07:19:00 CDT 2003 ok
Sat Apr 19 09:14:55 CDT 2003 ok
Sat Apr 19 09:16:00 CDT 2003 ok
Sat Apr 19 09:17:00 CDT 2003 ok
It shows that the cron didn't run when the server went down, I checked all console logs and it happened the same. Nothing is logged but boot.log shows clear when it's rebooted.

Any ideas except for broken RAM that crashes the virtual shared memory.

 

 

 

 

Top