The failover system mechanism increases reliability and availability by using established clustering technology to provide redundant implementations of software programs. The failover system is configured to automatically switch over to a redundant or standby IT resource instance whenever the currently active IT resource becomes unavailable.
Failover systems are commonly used for mission-critical programs or for reusable services that can introduce a single point of failure for multiple applications. A failover system can span more than one geographical region so that each location hosts one or more redundant implementations of the same IT resource.
This mechanism may rely on the resource replication mechanism to supply the redundant IT resource instances, that are then actively monitored to detect errors and unavailability conditions.
Failover systems come in two basic configurations:
- Active-Active - Redundant implementations of the IT resource actively serve the workload synchronously. An operational implementation takes over the processing whenever an IT resource failure is detected.
- Active-Passive - A standby or inactive implementation is activated to take over the processing from the IT resource that became unavailable.
Figure 1 compares the active-active and active-passive failover system models.
Figure 1 - An active-active configuration of a failover system (top) and an active-passive configuration (bottom).
Some failover systems are designed to redirect workloads to active IT resources that rely on specialized load balancers that detect failure conditions and exclude failed IT resource instances from the workload distribution. This type of failover system is suitable for IT resources that do not require execution state management and provide stateless processing capabilities. In technology architectures that are typically based on clustering and virtualization technologies, the redundant or standby IT resource implementations are also required to share their state and execution context. A complex task that is executed on a failed IT resource can remain operational in one of its redundant implementations.