Zero Downtime (Erl, Naserpour)
How can we accomplish a zero down time when both virtual and physical servers failures occur?
ProblemIt is challenging to provide zero downtime guarantees when a physical host acts as a single point of failure for virtual servers.
SolutionA fault tolerance system is established so that when a physical server fails, virtual servers are migrated to another physical server.
ApplicationA combination of virtual server fault tolerance, replication, clustering and load balancing are applied and all virtual servers are stored in a shared volume allowing different physical hosts to access their files.
MechanismsAudit Monitor, Cloud Storage Device, Cloud Usage Monitor, Failover System, Hypervisor, Logical Network Perimeter, Resource Cluster, Resource Replication, Virtual Server
Compound PatternsBurst In, Burst Out to Private Cloud, Burst Out to Public Cloud, Elastic Environment, Infrastructure-as-a-Service (IaaS), Multitenant Environment, Platform-as-a-Service (PaaS), Private Cloud, Public Cloud, Resilient Environment, Software-as-a-Service (SaaS)
A physical server naturally acts as a single point of failure for the virtual servers it hosts. As a result, when the physical server fails or is compromised, the availability of any (or all) hosted virtual servers can be affected. This makes the issuance of zero downtime guarantees by a cloud provider to cloud consumers challenging.
A failover system is established so that virtual servers are dynamically moved to different physical server hosts, in the event that their original physical server host fails.
Figure 1 - Physical Server A fails triggering the live VM migration program to dynamically move Virtual Server A to Physical Server B.
Multiple physical servers are assembled into a group that is controlled by a fault tolerance system capable of switching activity from one physical server to another, without interruption. Resource cluster and live VM migration components are commonly part of this form of high availability cloud architecture.
The resulting fault tolerance assures that, in case of physical server failure, hosted virtual servers will be migrated to a secondary physical server. All virtual servers are stored on a shared volume (as per the Persistent Virtual Network Configuration pattern) so that other physical server hosts in the same group can access their files.
Live storage replication can further be utilized to guarantee that virtual server files and hard disks remain available via secondary storage devices.
NIST Reference Architecture Mapping
This pattern relates to the highlighted parts of the NIST reference architecture, as follows: