Load Balancing, Redundancy & Failover

OK, the company I work for hosts a couple of managed Compaq servers (DL360 & DL380) at Energis UK (we are UK based), at the moment the DL360 runs all of our main sites, logging etc and the DL380 is more for the cpu intensive e-Commerce sites, requiring a little more CPU power and redundancy (RAID 5 for example).

At the moment we lease these servers as FMS (Facility Managed), so we have instant support if a component fails and a high guaranteed SLA....

However, although we have reached nearly 100% uptime (deduct the 5 mins every week for patch installation, which is done by them at night), but recently one of the fans died in the DL360's PSU, causing the system to auto-shutdown, Energis employ both Compaq Remote Insight and BMC Patrol to monitor the system, but unfortunately there seemed to be an issue with them getting on with replacing the PSU/Fan quickly (they said there was a pending restore job, which was not true), so instead of it taking for example 20 mins to replace the PSU, it ended up taking around 3 hrs..... although our SLA is 99.995%, which equates to a total of 2 1/2 days downtime, a number of our high profile clients are on our case, requiring for example failover, or at the moment (the clients do not wish to spend extra on complete site mirroring), they just want a failover page (for example maintenance page), which we have created....

However, at present I do not know which technology or application to use so that we can employ failover....

At the moment, the servers sit in a rack, with a bunch of other FMS systems, on a Cisco switch, we have to argue that although its possible for us to install for example a Load Balancing switch (for failover), if the link to the Cisco switch were to fail, the load balancing switch and any other process in place after that is going to me of no use....

--

I was looking into the intermediate level (assuming that the link to the cisco switch would stay up, at the moment I am unsure if the Cisco switch has redundant connections.... it is a 2924 the last time I noticed when I was in the datacentre last...) and employing for example an Alteon 180 server switch/load balancers, then just setting up a maintenance site on our DL380 (call it our secondary box) so that if the primary host (DL360) were to fail, then it would automatically sense this and talk to the DL380 instead....). The benefits of the Alteon not only mean we can do this (I assume we can, I am looking into the docs right now), but it also means that in the future if the client wishes to pay for complete site mirroring, we can employ the Alteon to load balance between the hosts.... (I will not get into mirroring at this stage though)

However, a more long term solution is to host a maintenance page with an entirely different ISP/Host, but this I assume involves some funky DNS trickery ? At the moment the DNS records for one of our particular clients is not under our control (it may be possible for it to be), but if we were to move to our current DNS provider (ZoneEdit), we could use one of the host failover services that they offer, meaning if the primary host is down, it will contact the next host in the list, this will give us what we are looking for, but we would also like to explore the installed hardware alternative.....

--

Help is very much appreciated !!! Load Balancing, Redundancy & Failover & Btw, the company in my sig is my own company (I work for both myself and another company)

 

 

 

 

Top