An IDC survey of line of business managers (“Services & Software Leading Indicators”) reveals their concern about IT support reliability for business-critical systems. Customers complain because the ATM is down; the terminal at the retail store is offline; they can’t book a flight online; and/or worse, doctors cannot access patient files. Yet, the IT Directors can prove that the server was up 99.999% of the time.
So what's causing the downtime? How do you determine the “weakest link” of the myriad factors impacting the continuous availability of your entire end-to-end IT infrastructure?
![Stratus Technologies [logo]](http://www.stratus.com/images/logo-on-white.jpg)

Nick D. There's no "weakest link". The issues change so frequently that it is virtually impossible to determine what is going to go wrong unless you have some type of performance monitoring tools. The best option we've used is to monitor the applications, database, and network and then responding as quickly as possible to reduce the downtime. How do you prevent downtime vs. minimize it?
There are so many components to the problem it practically requires a PhD to determine the cause of the downtime! We start by considering the firewall and router availability levels; if they look good, then assuming the application travels through a WAN or LAN, we factor in the SLA from the Telecom provider (typically they won’t go above 99% which means 44 hours of downtime a year). Next, we have to include our application availability and the data feeds from the database they access and that’s how you derive “unplanned” downtime. Of course, we can’t forget about the “planned” downtime due to change management, system maintenance, etc. When you add everything up we could be looking at 98% availability or 175 hours of downtime per year!