The SLA monitor mechanism is specifically used to observe the runtime performance of cloud services to ensure that they are fulfilling contractual quality-of-service requirements, as published in service level agreements (Figure 1). The data collected by the SLA monitor is processed by an SLA management system to be aggregated into SLA reporting metrics. The system can also proactively repair or failover cloud services when exception conditions occur, such as when the SLA monitor reports a cloud service as "down".
Figure 1 - The SLA monitor polls the cloud service by sending over polling request messages (MREQ1 to MREQN). The monitor receives polling response messages (MREP1 to MREPN) that report that the service was "up" at each polling cycle (1a). The SLA monitor stores the "up" time—time period of all polling cycles 1 to N—in the log database (1b). The SLA monitor polls the cloud service that is sending polling request messages (MREQN+1 to MREQN+M). Polling response messages are not received (2a). The response messages continue to time out, so the SLA monitor stores the "down" time—time period of all polling cycles N+1 to N+M—in the log database (2b). The SLA monitor sends polling request message (MREQN+M+1) and receives the polling response message (MREPN+M+1) (3a). The SLA monitor stores the "up" time in the log database (3b).