Dynamic Failure Detection and Recovery (Erl, Naserpour)
How can the notification and recovery of IT resource failure be automated?
ProblemWhen cloud-based IT resources fail, manual intervention may be unacceptably inefficient.
SolutionA watchdog system is established to monitor IT resource status and perform notifications and/or recovery attempts during failure conditions.
ApplicationDifferent intelligent monitoring and recovery technologies can be used to establish the automation of failure detection and recovery tasks with a focus on watching, deciding upon, acting upon, reporting and escalating IT resource failure conditions.
Compound PatternsBurst In, Burst Out to Private Cloud, Burst Out to Public Cloud, Elastic Environment, Infrastructure-as-a-Service (IaaS), Multitenant Environment, Platform-as-a-Service (PaaS), Private Cloud, Public Cloud, Resilient Environment, Software-as-a-Service (SaaS)
The SLA monitor keeps track of cloud consumer requests (1) and detects that a cloud service has failed (2).
The SLA monitor notifies the watchdog system (3), which restores the cloud service based on predefined policies (4).
NIST Reference Architecture Mapping
This pattern relates to the highlighted parts of the NIST reference architecture, as follows: