Synchronized Operating State (Erl, Naserpour)
How can the availability and reliability of virtual servers be ensured when high availability and clustering technology is unavailable?
ProblemA cloud consumer may be prevented from utilizing high availability and clustering technology for its virtual servers or operating systems, thereby making them more vulnerable to failure.
SolutionA composite failover system is created to not rely on clustering or high availability features but instead use heartbeat messages to synchronize virtual servers.
ApplicationThe heartbeat messages are processed by a specialized service agent and are exchanged between hypervisors, the hypervisor and virtual server, and the hypervisor and VIM.
MechanismsCloud Storage Device, Failover System, Hypervisor, Resource Replication, State Management Database, Virtual Server
Compound PatternsBurst In, Burst Out to Private Cloud, Burst Out to Public Cloud, Elastic Environment, Infrastructure-as-a-Service (IaaS), Multitenant Environment, Platform-as-a-Service (PaaS), Private Cloud, Public Cloud, Resilient Environment, Software-as-a-Service (SaaS)
Technical restrictions, licensing restrictions, or other reasons may prevent a cloud consumer from taking advantage of clustering and high availability technology and products. This can seriously jeopardize the availability and scalability of its cloud services and applications.
A system comprised of a set of mechanisms and relying on the use of heartbeat messages is established to emulate select features of clustering and high availability IT resources.
Figure 1 - Special heartbeat agents are employed to monitor heartbeat messages exchanged between the servers.
Heartbeat messages are processed by a heartbeat monitor agents and are exchanged between:
- each hypervisor and each virtual server
- each hypervisor and the central VIM
If an operating system is placed on a physical server, it needs to be converted into a virtual server prior to the issuance of heartbeat messages.
Figure 2 - The cloud architecture resulting from the application of this pattern.
- A virtual server is created from the physical server.
- The hypervisor proceeds to host the virtual server.
- The primary virtual server is equipped with fault tolerance and maintains a synchronized state via the use of heartbeat messages.
- The secondary server that shares the synchronized state is available in case the primary virtual server fails.
The application/service monitoring station monitors the servers and cloud services. In the event of failure, this station attempts recovery based on sequential pre-defined policies. If the primary server’s operating system fails, procedures are in place to avoid downtime.
Figure 3 - When the primary virtual server fails, along with its hosted cloud service, heartbeat messages are no longer transmitted. As a result, the hypervisor recognizes the failure and switches activity to the secondary virtual server that maintains the synchronized state. After the primary virtual server is back online, the hypervisor creates a new secondary for the new primary, and proceeds to save it as a synchronized non-active state.
NIST Reference Architecture Mapping
This pattern relates to the highlighted parts of the NIST reference architecture, as follows: