Home > Design Patterns > Zero Downtime
Zero Downtime

Zero Downtime (Erl, Naserpour)

How can we accomplish a zero down time when both virtual and physical servers failures occur?

Problem

It is challenging to provide zero downtime guarantees when a physical host acts as a single point of failure for virtual servers.

Solution

A fault tolerance system is established so that when a physical server fails, virtual servers are migrated to another physical server.

Application

A combination of virtual server fault tolerance, replication, clustering and load balancing are applied and all virtual servers are stored in a shared volume allowing different physical hosts to access their files.

Problem

A physical server naturally acts as a single point of failure for the virtual servers it hosts. As a result, when the physical server fails or is compromised, the availability of any (or all) hosted virtual servers can be affected. This makes the issuance of zero downtime guarantees by a cloud provider to cloud consumers challenging.

Solution

A failover system is established so that virtual servers are dynamically moved to different physical server hosts, in the event that their original physical server host fails.

Zero Downtime: Physical Server A fails triggering the live VM migration program to dynamically move Virtual Server A to Physical Server B.

Figure 1 - Physical Server A fails triggering the live VM migration program to dynamically move Virtual Server A to Physical Server B.

Application

Multiple physical servers are assembled into a group that is controlled by a fault tolerance system capable of switching activity from one physical server to another, without interruption. Resource cluster and live VM migration components are commonly part of this form of high availability cloud architecture.

The resulting fault tolerance assures that, in case of physical server failure, hosted virtual servers will be migrated to a secondary physical server. All virtual servers are stored on a shared volume (as per the Persistent Virtual Network Configuration pattern) so that other physical server hosts in the same group can access their files.

Live storage replication can further be utilized to guarantee that virtual server files and hard disks remain available via secondary storage devices.

NIST Reference Architecture Mapping

This pattern relates to the highlighted parts of the NIST reference architecture, as follows:

Zero Downtime: NIST Reference Architecture Mapping
Zero Downtime: NIST Reference Architecture Mapping
CloudSchool.com Cloud Certified Professional (CCP) Module 5: Advanced Cloud Architecture.

This pattern is covered in CCP Module 5: Advanced Cloud Architecture..

For more information regarding the Cloud Certified Professional (CCP) curriculum, visit www.cloudschool.com.