Home > Design Patterns > Dynamic Failure Detection and Recovery

Dynamic Failure Detection and Recovery (Erl, Naserpour)

How can the notification and recovery of IT resource failure be automated?

Dynamic Failure Detection and Recovery

Problem

When cloud-based IT resources fail, manual intervention may be unacceptably inefficient.

Solution

A watchdog system is established to monitor IT resource status and perform notifications and/or recovery attempts during failure conditions.

Application

Different intelligent monitoring and recovery technologies can be used to establish the automation of failure detection and recovery tasks with a focus on watching, deciding upon, acting upon, reporting and escalating IT resource failure conditions.
Dynamic Failure Detection and Recovery: The intelligent watchdog monitor keeps track of cloud consumer requests (1) and detects that a cloud service has failed (2).

The intelligent watchdog monitor keeps track of cloud consumer requests (1) and detects that a cloud service has failed (2).

Dynamic Failure Detection and Recovery: The intelligent watchdog monitor notifies the resilient watchdog system (3), which restores the cloud service based on predefined policies (4).

The intelligent watchdog monitor notifies the resilient watchdog system (3), which restores the cloud service based on predefined policies (4).

Dynamic Failure Detection and Recovery: In the event of any failures, the active monitor refers to its predefined policies to recover the service step by step, escalating the processes as the problem proves to be deeper than expected.

In the event of any failures, the active monitor refers to its predefined policies to recover the service step by step, escalating the processes as the problem proves to be deeper than expected.

NIST Reference Architecture Mapping

This pattern relates to the highlighted parts of the NIST reference architecture, as follows:

Dynamic Failure Detection and Recovery: NIST Reference Architecture Mapping
Dynamic Failure Detection and Recovery: NIST Reference Architecture Mapping