Perhaps the most timely proof that fault tolerance is relevant in today’s world is the fact many high-tech companies boldly profess to offer five 9s availability in some flavor (ref IBM’s recent webcast), and IT departments still pursue uptime perfection because their clients expect nothing less. Good-enough technology just is not good enough for some businesses operations.
What's your point of view?
![Stratus Technologies [logo]](http://www.stratus.com/images/logo-on-white.jpg)

Ref discussion above. Is it the industry or the application that determines availability needs? (Actually it's the user that determines importance!). Here is an interesting example. Beijing constructed an underground roadway to ease surface congestion, connect parking lots, and improve access to 2008 Olympic event venues. Abundant data collection devices and video cameras report back to a control center to enable rapid response to conditions and mishaps within the 5.5 km loop. This traffic management and control software is defined by the environment to be mission-critical, even if only for the duration of the event. Similar considerations can affect a virtualized infrastructure when, at certain times during the day, week or month, a particular application can become mission-critical. For that period of time the application is migrated to a server resource pool that is fault-tolerant.
I think virtualization could be driving more mainstream consideration of continuous availability and fault tolerance. Putting day-to-day apps running in a virtual environment alongside a number of other virtual environments on one machine tends to make that machine a mission critical piece of the business. So, the calculus of how reliable is reliable enough begins to change, IMHO.
The fact that the physical platform can be a single point of failure (along with the virtualization layer itself) is a fact more IT managers need to realize, ideally before they have a crisis. Regular servers and clusters are not up to the task.
I wonder if fault-tolerance is really only relevant in obvious industries (the banks, the airports and so on). I'd be surprised if some standard enterprise would go fault-tolerant just to support day-to-day apps.
I would suggest the FT is relevant by application more so than by industry, and that application value is defined not only by the cost of downtime but by the value users of the service attach to it. Exchange software is a perfect example; more than a few users and businesses will define its availability as essential to their operations.
Compared to alternative solutions like clusters, FT carries a pretty hefty premium. Even basic servers are incredibly reliable, so I have to wonder if fault tolerance is really worth that price difference.
Alternative solutions definitely have their place in the availability hierarchy. But consider Gartner’s estimate that the average cost of downtime is $108K hour – that’s only hard dollars – and the price differential between a 99.95% solution and a 99.999% solution is rendered meaningless. System price is only one data point and, when factored in with cost of admin, managing and servicing over the lifecycle of an app, relatively minor.