Describe, in your own words, what the SLIs are, based on an SLO of monthly uptime and request response time. ?
The SLI for monthly uptime would be 99.5% monthly uptime. And for request response time would be response time below 5 sec per request.
It is important to know why we want to measure certain metrics for our customers. Describe in detail 5 metrics to measure these SLIs.
- Time Taken for a request to respond.
- The number of failed requests.
- The number of requests over a period of time.
- CPU and Memory utilization by the service.
- Fronted & Backend service uptime.
Name: POST Request to `localhost:8081/star` gives 500 error
Date: 03-12-2021
Subject: Upon sending a POST request to Backend API star endpoint i.e `localhost:8081/star` with payload is failing with 500 error
Affected Area: Backend API /star endpoint
Severity: 1
Description: MongoDB instance is not set up properly which is causing the 500
error
We want to create an SLO guaranteeing that our application has a 99.95% uptime per month. Name four SLIs that you would use to measure the success of this SLO.
- Latency
- Error
- Traffic
- Saturation
Now that we have our SLIs and SLOs, create a list of 2-3 KPIs to accurately measure these metrics as well as a description of why those KPIs were chosen. We will make a dashboard for this, but first, write them down here.
-
Latency:
- Frontend and Backend Service Uptime
- Average time is taken to send a response
-
Error:
- Total number of 40x & 50x errors in Frontend
- Total number of 40x & 50x errors in Backend
-
Traffic:
- Number of requests over 30s
- Number of requests with 200 responses over 30s
-
Saturation:
- CPU usage by the Frontend & Backend.
- Memory usage by the Frontend & Backend.
- Average time is taken by the request for a response.
- Frontend service uptime status.
- Backend service uptime status.
- Frontend service 40x & 50x errors.
- Backend service 40x & 50x errors.
- Backend Memory usage.
- Frontend Memory usage.
- Number of requests received over 30sec.
- Number of requests received with 200 responses over 30sec.
- Frontend CPU usage.
- Backend CPU usage.