Linux Stability (2.6, high load) - how many crashes are 'normal'
I am running several boxes with linux 2.6.[6-8] at TP/total control... all very heavily loaded (80-99% disk io, 80-100% cpu all day long, several thousand connections per box (up to 35k in conntrack)). Unfortunately they crash quite often:- dedicated mysql boxes never crash(ed) (so far)
- the boxes with apache/php crash about once every 4-6 weeks on average... e.g. yesterday eth0 all in a sudden stopped working, last week kernel oops in ext3 and several other mysterious/unexplained crashes before that.
I tried the standart 2.4 kernels... apache servers still crashed (a little bit less often but still). Disabling HT made the boxes much slower and is not an option.
Are these many crashes normal? can anybody give me advice how to get a more stable setup?