- Prerequisites
- Alerts and Notifications Overview
- List of Included Alerts
- Creating a New Alert
- Configuring Alertmanager to Send Notifications to Slack
- Debugging a Firing Alert
To complete this tutorial, you will need:
- Prometheus monitoring stack installed in your cluster as explained in Prometheus Stack.
- Loki stack installed in your cluster as explained in Loki Stack.
- Emojivoto Sample App deployed in the cluster. Please follow the steps from the main repository. You will be creating alerts for this application.
- Administrative rights over a
Slack
workspace. Later on you will be creating an application with an incomingwebhook
which will be used to send notifications fromAlertmanager
.
Often times you need to be notified immediately about any critical issue in your cluster. That is where Alertmanager
comes into the picture. Alertmanager
helps in aggregating the alerts
, and sending notifications
as shown in the diagram below.
Usually, alertmanager is deployed alongside Prometheus and forms the alerting layer of the kube-prom-stack
. It handles alerts generated by Prometheus by deduplicating, grouping, and routing them to various integrations such as email, Slack or PagerDuty.
Alerts and notifications are a critical part of your workflow. When things go wrong (e.g. any service is down, or a pod is crashing, etc.), you will want to get notifications in real time to handle critical situations as soon as possible.
Alertmanager is part of the kube-prom-stack
installed in your cluster in Prometheus Stack. For this tutorial you will be using the same manifest file used for configuring Prometheus
. AlertManager
allows you to receive alerts from various clients (sources), like Prometheus
for example. Rules are created on the Prometheus
side, which in turn can fire alerts. Then, it’s the responsibility of AlertManager
to intercept those alerts, group them (aggregation), apply other transformations and finally route to the configured receivers. Notification messages can be further formatted to include additional details if desired. You can use Slack, Gmail, etc to send real time notifications.
In this section, you will learn how to inspect the existing alerts, create new ones, and then configure AlertManager
to send notifications via Slack
.
Kube-prom-stack has over a hundred rules already activated. To access the prometheus console, first do a port-forward to your local machine.
kubectl --namespace monitoring port-forward svc/kube-prom-stack-kube-prome-prometheus 9091:9090
Open a web browser on localhost:9091 and access the Alerts
menu item. You should see some predefined Alerts
and it should look like the following:
Click on any of the alerts to expand it. You can see information about expression it queries, the labels it has setup and annotations which is very important from a templating perspective. Prometheus supports templating in the annotations and labels of alerts. For more information check out the official documentation.
To create a new alert, you need to add a new definition in the additionalPrometheusRule
s section from the kube-prom-stack Helm values file.
You will be creating a sample alert that will trigger if the emojivoto
namespace does not have an expected number of instances. The expected number of pods for the emojivoto
application is 4.
First, open the 04-setup-observability/assets/manifests/prom-stack-values.yaml
file provided in the Starter Kit
repository, using a text editor of your choice (preferably with YAML
lint support). Then, uncomment the additionalPrometheusRules
block.
additionalPrometheusRulesMap:
rule-name:
groups:
- name: emojivoto-instance-down
rules:
- alert: EmojivotoInstanceDown
expr: sum(kube_pod_owner{namespace="emojivoto"}) by (namespace) < 4
for: 1m
labels:
severity: 'critical'
annotations:
description: ' The Number of pods from the namespace {{ $labels.namespace }} is lower than the expected 4. '
summary: 'Pod {{ $labels.pod }} down'
Finally, apply settings using Helm
:
HELM_CHART_VERSION="35.5.1"
helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack --version "${HELM_CHART_VERSION}" \
--namespace monitoring \
-f "04-setup-observability/assets/manifests/prom-stack-values-v${HELM_CHART_VERSION}.yaml"
To check that the alert has been created successfully, navigate to the [Promethes Console]localhost:9091 click on the Alerts
menu item and identify the EmojivotioInstanceDown
alert. It should be visible at the bottom of the list.
To complete this section you need to have administrative rights over a workspace. This will enable you to create the incoming webhook you will need in the next steps. You will also need to create a channel where you would like to receive notifications from AlertManager
.
You will be configuring Alertmanager
to range over all of the alerts received printing their respective summaries and descriptions on new lines.
Steps to follow:
- Open a web browser and navigate to
https://api.slack.com/apps
and click on theCreate New App
button. - In the
Create an app
window select theFrom scratch
option. Then, give your application a name and select the appropriate workspace. - From the
Basic Information
page click on theIncoming Webhooks
option, turn it on and click on theAdd New Webhook to Workspace
button at the bottom. - On the next page, use the
Search for a channel...
drop-down list to select the desired channel where you want to send notifications. When ready, click on theAllow
button. - Take note of the
Webhook URL
value displayed on the page. You will be using it in the next section.
Next, you will tell Alertmanager
how to send Slack
notifications. Open the 04-setup-observability/assets/manifests/prom-stack-values-v35.5.1.yaml
file provided in the Starter Kit repository
, using a text editor of your choice (preferably with YAML lint support). Then, uncomment the entire alertmanager.config
block. Make sure to update the slack_api_url
and channel
values by replacing the <> placeholders accordingly:
alertmanager:
enabled: true
config:
global:
resolve_timeout: 5m
slack_api_url: "<YOUR_SLACK_APP_INCOMING_WEBHOOK_URL_HERE>"
route:
receiver: "slack-notifications"
repeat_interval: 12h
routes:
- receiver: "slack-notifications"
# matchers:
# - alertname="EmojivotoInstanceDown"
# continue: false
receivers:
- name: "slack-notifications"
slack_configs:
- channel: "#<YOUR_SLACK_CHANNEL_NAME_HERE>"
send_resolved: true
title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
Explanations for the above configuration:
slack_api_url
- incoming Slack webhook URL created in step 4.receivers.[].slack_configs
- defines the Slack channel used to send notifications, notification title and the actual message. It is also possible to format the notification message (or body) based on your requirements.title
andtext
- iterates over the firing alerts and prints out the summary and description using thePrometheus
templating system.send_resolved
- boolean indicating ifAlertmanager
should send a notification when anAlert
is not firing anymore.
Note:
The matcher
and continue
parameters are still commented out as you will be uncomentting that later on in the guide. For now it should stay commented.
Finally, upgrade the kube-prometheus-stack
, using Helm
:
HELM_CHART_VERSION="35.5.1"
helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack --version "${HELM_CHART_VERSION}" \
--namespace monitoring \
-f "04-setup-observability/assets/manifests/prom-stack-values-v${HELM_CHART_VERSION}.yaml"
At this point, you should receive slack notifications
for all the firing alerts.
Next, you're going to test if the EmojivotoInstanceDown
alert added previously works and sends a notification to Slack
by downscaling the number of replicas for the emoji
deployment from the emojivoto
namespace.
Steps to follow:
-
From your terminal run the following command to bring the number of replicas for the
emoji
deployment to 0:kubectl scale --replicas=0 deployment/emoji -n emojivoto
-
Open a web browser on localhost:9091 and access the
Alerts
menu item. Search for theEmojivotoInstanceDown
alert created earlier. The status of the alert should beFiring
after about one minute of scaling down the deployment. -
A message notification will be sent to
Slack
to the channel you configured earlier if everything went well. You should see the "The Number of pods from the namespace emojivoto is lower than the expected 4." alert in theSlack
message as configure in theannotations.description
config of theadditionalPrometheusRulesMap
block.
Currently all of the Alerts
firing will be sent to the Slack
channel. This can be cause for notification fatigue. To drill down on what is sent you can restrict Alertmanager
to only send notification for alerts which match a certain pattern. This is done using the matcher
parameter. Open the 04-setup-observability/assets/manifests/prom-stack-values-v35.5.1.yaml
file provided in the Starter Kit repository
, using a text editor of your choice (preferably with YAML lint support). Then, uncomment the entire alertmanager.config
block. Make sure to uncomment the matcher
and the continue
parameters:
config:
global:
resolve_timeout: 5m
slack_api_url: "<YOUR_SLACK_APP_INCOMING_WEBHOOK_URL_HERE>"
route:
receiver: "slack-notifications"
repeat_interval: 12h
routes:
- receiver: "slack-notifications"
matchers:
- alertname="EmojivotoInstanceDown"
continue: false
receivers:
- name: "slack-notifications"
slack_configs:
- channel: "#<YOUR_SLACK_CHANNEL_NAME_HERE>"
send_resolved: true
title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
Finally, upgrade the kube-prometheus-stack
, using Helm
:
HELM_CHART_VERSION="35.5.1"
helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack --version "${HELM_CHART_VERSION}" \
--namespace monitoring \
-f "04-setup-observability/assets/manifests/prom-stack-values-v${HELM_CHART_VERSION}.yaml"
At this point you should only receieve alerts from the matching EmojivotoInstanceDown
alertname. Since the continue
is set to false Alertmanager
will only send notifications from this alert and stop sending for others.
Notes:
Clicking on the notification name in Slack
will open a web browser to an unreachable web page with the internal Kubernetes DNS of the Alertmanager
pod. This is expected. For more information you can check out this article.
For additional information about the configuration parameters for Alertmanager
you can check out this doc.
You can also at some notification examples in this article.
When an alert fires and sends a notification in Slack it's important that you can debug the problem easily and find the root cause in a timely manner.
To do this you can make use of Grafana
which has already been installed in Prometheus Stack and of Loki Stack.
Steps to follow:
-
Create a port forward for
Grafana
on port3000
:kubectl --namespace monitoring port-forward svc/kube-prom-stack-grafana 3000:80
-
Open a web browser on localhost:3000 and log in using the default credentials (
admin/prom-operator
). -
Navigate to the Alerting section
-
From the
State
filter click on theFiring
option, identify theemojivoto-instance-down
alert defined in the Creating a New Alert section and expand it. You should see the following: -
Click on the
See graph
button. From the next page you can observe the count for the number of pods in theemojivoto
namespace displayed as a metric. Take note thatGrafana
filters results using a time range ofLast 1 hour
by default. Adjust this to the time interval when theAlert
fired. You can adjust the time range using anabsolute time range
using aFrom To
option for a more granular result or using aQuick range
such asLast 30 minutes
. -
From the
Explore
tab select theLoki
data source and in theLog browser
input the following:{namespace="emojivoto"}
and click on theRun query
button from the top right side of the page. You should see the following: . Make sure you adjust the time interval accordingly. -
From this page you can filter the log results further. For example to filter the logs for the
web-svc
container of theemojivoto
namespace you can enter the following query:{namespace="emojivoto", container="web-svc"}
. More explanations about usingLogQL
can be found in Step 3 - Using LogQL from Loki Stack. -
You can also make use of the Exported Kubernetes Events installed previously and filter for events related to the
emojivoto
namespace. Enter the following query in the log browser:{app="event-exporter"} |= "emojivoto"
. This will return the kubernetes events related to theemojivoto
namespace.