Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Alertmanager controller #201

Merged
merged 74 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
98d6695
Add alertmanager config handling
TheoBrigitte Dec 5, 2024
b575507
make the alertmanagerURL simpler
TheoBrigitte Dec 5, 2024
f281c75
make the build green
TheoBrigitte Dec 5, 2024
b58a751
I said green
TheoBrigitte Dec 5, 2024
bc2aef8
go mod tidy
TheoBrigitte Dec 5, 2024
8beb36c
Merge remote-tracking branch 'origin/main' into alertmanager-config
TheoBrigitte Dec 9, 2024
a6b2ccc
Add some stuff
TheoBrigitte Dec 9, 2024
5b47de7
Merge remote-tracking branch 'origin/main' into alertmanager-config
TheoBrigitte Dec 9, 2024
5e41fc5
Add alertmanager.Configure
TheoBrigitte Dec 9, 2024
707b23a
setup logger before loading env vars
TheoBrigitte Dec 10, 2024
3776e72
Retrive all templates by suffix
TheoBrigitte Dec 10, 2024
3e6e2f9
Fix maps > slices.Collect
TheoBrigitte Dec 10, 2024
d8a3026
Fix test
TheoBrigitte Dec 10, 2024
b160659
Remove test
TheoBrigitte Dec 10, 2024
b32a6de
go mod tidy
TheoBrigitte Dec 10, 2024
2023dc7
fix linting errors
TheoBrigitte Dec 10, 2024
dfec94d
Add alertmanager helm helpers
TheoBrigitte Dec 10, 2024
4e89889
Wire Alertmanager into GrafanaOrganization controller
TheoBrigitte Dec 10, 2024
dd4872a
Merge remote-tracking branch 'origin/main' into alertmanager-config
TheoBrigitte Dec 10, 2024
cebac34
Set the client
TheoBrigitte Dec 10, 2024
171b62a
Add Alertmanager config and templates in Helm chart
TheoBrigitte Dec 10, 2024
9eea3f8
use .helm-template instead of .tpl
TheoBrigitte Dec 10, 2024
27b62fa
re-use existing values
TheoBrigitte Dec 10, 2024
49b5a1a
normalize values names
TheoBrigitte Dec 10, 2024
2ea1101
try to make helm happy: error calling tpl: cannot retrieve Template.B…
TheoBrigitte Dec 10, 2024
801cadd
helm 3.10.3 I hate you deeply
TheoBrigitte Dec 10, 2024
8cbdbfb
Release v0.10.0 (#191)
taylorbot Dec 10, 2024
5d320b9
quote the tmp_string
TheoBrigitte Dec 10, 2024
8a79cf5
Add values and alertmanagerURL
TheoBrigitte Dec 10, 2024
4cfbb20
Add alertmanagerURL
TheoBrigitte Dec 10, 2024
3c450c7
debug log
TheoBrigitte Dec 10, 2024
c522b81
fix debug log
TheoBrigitte Dec 10, 2024
388a639
Fix invalid request body
TheoBrigitte Dec 10, 2024
a366895
Handle non 201 errors
TheoBrigitte Dec 10, 2024
377edef
Merge remote-tracking branch 'origin/main' into alertmanager-config
TheoBrigitte Dec 10, 2024
492a1d7
Merge remote-tracking branch 'origin/main' into alertmanager-config-helm
TheoBrigitte Dec 13, 2024
9591a92
rename alertmanager to alerting in values.yaml
TheoBrigitte Dec 13, 2024
c715315
Merge remote-tracking branch 'origin/main' into alertmanager-config
TheoBrigitte Dec 13, 2024
e58c682
remove alertmanager job from grafana organization controller
TheoBrigitte Dec 15, 2024
740d3b9
Add Alertmanager controller
TheoBrigitte Dec 15, 2024
cc9987d
revert grafana organization controller changes
TheoBrigitte Dec 15, 2024
b3f9a8e
Merge remote-tracking branch 'origin/alertmanager-config-helm' into a…
TheoBrigitte Dec 15, 2024
18935b1
update alertmanager > alerting in deployment.yaml
TheoBrigitte Dec 15, 2024
7d75b09
Merge branch 'alertmanager-config' into alertmanager-controller
TheoBrigitte Dec 15, 2024
fbda645
revert grafana organization controller changes
TheoBrigitte Dec 15, 2024
b0eb4dc
pass secret to Configure, add comment
TheoBrigitte Dec 15, 2024
ee02be2
Merge branch 'alertmanager-config' into alertmanager-controller
TheoBrigitte Dec 15, 2024
6fbcf61
update alertmanager job
TheoBrigitte Dec 15, 2024
8cda08b
get base template name
TheoBrigitte Dec 15, 2024
01ce0f3
remove uneeded flags
TheoBrigitte Dec 15, 2024
a80d94d
Merge branch 'alertmanager-config' into alertmanager-controller
TheoBrigitte Dec 15, 2024
a666060
restore flags
TheoBrigitte Dec 15, 2024
5b639a1
Revert "Merge remote-tracking branch 'origin/alertmanager-config-helm…
TheoBrigitte Dec 15, 2024
a91e537
rename alertmanager to alerting in values.yaml
TheoBrigitte Dec 15, 2024
a9a6092
Merge branch 'alertmanager-config' into alertmanager-controller
TheoBrigitte Dec 15, 2024
039b403
fix reconciled object: Secret
TheoBrigitte Dec 15, 2024
2f50a34
predicates: do not process delete events and not ready pods
TheoBrigitte Dec 15, 2024
fe60e95
Move X-Scope-OrgID to common/monitoring
TheoBrigitte Dec 16, 2024
ea52892
rename Job to Service
TheoBrigitte Dec 16, 2024
871065d
Merge remote-tracking branch 'origin/main' into alertmanager-config
TheoBrigitte Dec 16, 2024
11dfba8
Merge branch 'alertmanager-config' into alertmanager-controller
TheoBrigitte Dec 16, 2024
bbbe63c
Show status code in error
TheoBrigitte Dec 16, 2024
151c921
Merge branch 'alertmanager-config' into alertmanager-controller
TheoBrigitte Dec 16, 2024
ae01927
rename Job to Service
TheoBrigitte Dec 16, 2024
ea9e054
pass conf rather than args
TheoBrigitte Dec 16, 2024
a0ae593
remove controller value from logs
TheoBrigitte Dec 16, 2024
8ded036
log finished at the end
TheoBrigitte Dec 16, 2024
4fb556f
rename alertmanager_secret_predicate.go > alertmanager_predicates.go
TheoBrigitte Dec 16, 2024
dfd4844
Add a --alertmanager-enable feature flag to enabled the Alertmanager …
TheoBrigitte Dec 16, 2024
cc87d93
do not ignore not found error
TheoBrigitte Dec 16, 2024
f1259ee
simplify, remove reconcileCreate
TheoBrigitte Dec 16, 2024
8df047c
Merge remote-tracking branch 'origin/main' into alertmanager-controller
TheoBrigitte Dec 16, 2024
1d520dc
rename namespace to operator-namespace
TheoBrigitte Dec 16, 2024
9a06d4a
pass conf to NewAlertmanagerSecretPredicate
TheoBrigitte Dec 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add Alertmanager controller

## [0.10.1] - 2024-12-12

### Fixed
Expand Down
9 changes: 9 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ require (
github.com/opsgenie/opsgenie-go-sdk-v2 v1.2.23
github.com/pkg/errors v0.9.1
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.79.0
github.com/prometheus/alertmanager v0.27.0
github.com/prometheus/client_golang v1.20.5
github.com/prometheus/common v0.61.0
github.com/sirupsen/logrus v1.9.3
Expand Down Expand Up @@ -87,7 +88,10 @@ require (
github.com/Masterminds/goutils v1.1.1 // indirect
github.com/Masterminds/semver/v3 v3.3.0 // indirect
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 // indirect
github.com/aws/aws-sdk-go v1.50.8 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/go-kit/log v0.2.1 // indirect
github.com/go-logfmt/logfmt v0.5.1 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-openapi/analysis v0.23.0 // indirect
github.com/go-openapi/errors v0.22.0 // indirect
Expand All @@ -97,12 +101,16 @@ require (
github.com/go-openapi/strfmt v0.23.0 // indirect
github.com/go-openapi/validate v0.24.0 // indirect
github.com/huandu/xstrings v1.5.0 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect
github.com/jpillora/backoff v1.0.0 // indirect
github.com/klauspost/compress v1.17.9 // indirect
github.com/mitchellh/copystructure v1.2.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f // indirect
github.com/oklog/ulid v1.3.1 // indirect
github.com/opentracing/opentracing-go v1.2.0 // indirect
github.com/prometheus/common/sigv4 v0.1.0 // indirect
github.com/shopspring/decimal v1.4.0 // indirect
github.com/spf13/cast v1.7.0 // indirect
github.com/x448/float16 v0.8.4 // indirect
Expand All @@ -112,6 +120,7 @@ require (
go.opentelemetry.io/otel/trace v1.28.0 // indirect
golang.org/x/sync v0.10.0 // indirect
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
)

replace (
Expand Down
414 changes: 414 additions & 0 deletions go.sum

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{{/* vim: set filetype=mustache: */}}

{{- define "alertmanager-secret.name" -}}
{{- include "resource.default.name" . -}}-alertmanager
{{- end }}
3 changes: 3 additions & 0 deletions helm/observability-operator/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,14 @@ spec:
- --management-cluster-pipeline={{ $.Values.managementCluster.pipeline }}
- --management-cluster-region={{ $.Values.managementCluster.region }}
# Monitoring configuration
- --alertmanager-secret-name={{ include "alertmanager-secret.name" . }}
- --alertmanager-url={{ $.Values.alerting.alertmanagerURL }}
- --monitoring-enabled={{ $.Values.monitoring.enabled }}
- --monitoring-agent={{ $.Values.monitoring.agent }}
- --monitoring-sharding-scale-up-series-count={{ $.Values.monitoring.sharding.scaleUpSeriesCount }}
- --monitoring-sharding-scale-down-percentage={{ $.Values.monitoring.sharding.scaleDownPercentage }}
- --monitoring-wal-truncate-frequency={{ $.Values.monitoring.wal.truncateFrequency }}
- --namespace={{ include "resource.default.namespace" . }}
{{- if .Values.monitoring.prometheusVersion }}
- --prometheus-version={{ $.Values.monitoring.prometheusVersion }}
{{- end }}
Expand Down
7 changes: 7 additions & 0 deletions helm/observability-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@ managementCluster:
pipeline: pipeline
region: region

alerting:
alertmanagerURL: ""
grafanaAddress: ""
proxyURL: ""
slackAPIToken: ""
slackAPIURL: ""

monitoring:
agent: alloy
enabled: false
Expand Down
102 changes: 102 additions & 0 deletions internal/controller/alertmanager_controller.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
package controller

import (
"context"

v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/builder"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/handler"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/reconcile"

"github.com/pkg/errors"

"github.com/giantswarm/observability-operator/internal/controller/predicates"
"github.com/giantswarm/observability-operator/pkg/alertmanager"
"github.com/giantswarm/observability-operator/pkg/config"
)

// AlertmanagerReconciler reconciles the Alertmanager secret created by the observability-operator Helm chart
// and configures the Alertmanager instance with the configuration stored in the secret.
// This controller do not make use of finalizers as the configuration is not removed from Alertmanager when the secret is deleted.
type AlertmanagerReconciler struct {
client client.Client

alertmanagerJob alertmanager.Job
}

// SetupAlertmanagerReconciler adds a controller into mgr that reconciles the Alertmanager secret.
func SetupAlertmanagerReconciler(mgr ctrl.Manager, conf config.Config) error {
r := &AlertmanagerReconciler{
client: mgr.GetClient(),
alertmanagerJob: alertmanager.New(conf),
}

// Filter only the Alertmanager secret created by the observability-operator Helm chart
secretPredicate := predicates.NewAlertmanagerSecretPredicate(conf.Monitoring.AlertmanagerSecretName, conf.Namespace)

// Filter only the Mimir Alertmanager pod
podPredicate := predicates.NewAlertmanagerPodPredicate()

// Requeue the Alertmanager secret when the Mimir Alertmanager pod changes
p := podEventHandler(conf.Monitoring.AlertmanagerSecretName, conf.Namespace)

// Setup the controller
return ctrl.NewControllerManagedBy(mgr).
For(&v1.Secret{}, builder.WithPredicates(secretPredicate)).
Watches(&v1.Pod{}, p, builder.WithPredicates(podPredicate)).
Complete(r)
}

// podEventHandler returns an event handler that enqueues requests for the Alertmanager secret only.
// For now there is only one Alertmanager secret to be reconciled.
func podEventHandler(secretName, namespace string) handler.EventHandler {
TheoBrigitte marked this conversation as resolved.
Show resolved Hide resolved
return handler.EnqueueRequestsFromMapFunc(func(ctx context.Context, obj client.Object) []ctrl.Request {
return []reconcile.Request{
{
NamespacedName: types.NamespacedName{
Name: secretName,
Namespace: namespace,
},
},
}
})
}

// Reconcile main logic
func (r AlertmanagerReconciler) Reconcile(ctx context.Context, req reconcile.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
logger = logger.WithValues("controller", "alertmanager")
log.IntoContext(ctx, logger)

logger.Info("Started reconciling")
TheoBrigitte marked this conversation as resolved.
Show resolved Hide resolved
defer logger.Info("Finished reconciling")

// Retrieve the secret being reconciled
secret := &v1.Secret{}
if err := r.client.Get(ctx, req.NamespacedName, secret); err != nil {
return ctrl.Result{}, errors.WithStack(client.IgnoreNotFound(err))
TheoBrigitte marked this conversation as resolved.
Show resolved Hide resolved
}

if !secret.DeletionTimestamp.IsZero() {
TheoBrigitte marked this conversation as resolved.
Show resolved Hide resolved
// Nothing to do if the secret is being deleted
// Configuration is not removed from Alertmanager when the secret is deleted.
return ctrl.Result{}, nil
}

return r.reconcileCreate(ctx, secret)
}

// Handle create and update events
func (r AlertmanagerReconciler) reconcileCreate(ctx context.Context, secret *v1.Secret) (ctrl.Result, error) { // nolint: unparam
// Ensure the configuration is set and up to date in Alertmanager
err := r.alertmanagerJob.Configure(ctx, secret)
if err != nil {
return ctrl.Result{}, errors.WithStack(err)
}

return ctrl.Result{}, nil
}
76 changes: 76 additions & 0 deletions internal/controller/predicates/alertmanager_secret_predicate.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
package predicates

import (
v1 "k8s.io/api/core/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/predicate"
)

// NewAlertmanagerSecretPredicate returns a predicate that filters only the Alertmanager secret created by the observability-operator Helm chart.
func NewAlertmanagerSecretPredicate(secretName, namespace string) predicate.Predicate {
filter := func(object client.Object) bool {
if object == nil {
return false
}

secret, ok := object.(*v1.Secret)
if !ok {
return false
}

if !secret.DeletionTimestamp.IsZero() {
return false
}

labels := secret.GetLabels()

ok = secret.GetName() == secretName &&
secret.GetNamespace() == namespace &&
labels != nil &&
labels["app.kubernetes.io/name"] == "observability-operator"

return ok
}

p := predicate.NewPredicateFuncs(filter)

return p
}

const (
mimirNamespace = "mimir"
mimirInstance = "mimir"
mimirAlertmanagerComponent = "alertmanager"
)

// NewAlertmanagerPodPredicate returns a predicate that filters only the Mimir Alertmanager pod.
func NewAlertmanagerPodPredicate() predicate.Predicate {
filter := func(object client.Object) bool {
if object == nil {
return false
}

pod, ok := object.(*v1.Pod)
if !ok {
return false
}

if !pod.DeletionTimestamp.IsZero() {
return false
}

labels := pod.GetLabels()

ok = pod.GetNamespace() == mimirNamespace &&
labels != nil &&
labels["app.kubernetes.io/component"] == mimirAlertmanagerComponent &&
labels["app.kubernetes.io/instance"] == mimirInstance &&
isPodReady(pod)

return ok
}

p := predicate.NewPredicateFuncs(filter)

return p
}
17 changes: 15 additions & 2 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ func main() {
"If set the metrics endpoint is served securely")
flag.BoolVar(&conf.EnableHTTP2, "enable-http2", false,
"If set, HTTP/2 will be enabled for the metrics and webhook servers")
flag.StringVar(&conf.Namespace, "namespace", "",
"The namespace where the observability-operator is running.")
TheoBrigitte marked this conversation as resolved.
Show resolved Hide resolved

// Management cluster configuration flags.
flag.StringVar(&conf.ManagementCluster.BaseDomain, "management-cluster-base-domain", "",
Expand All @@ -90,6 +92,10 @@ func main() {
"The region of the management cluster.")

// Monitoring configuration flags.
flag.StringVar(&conf.Monitoring.AlertmanagerSecretName, "alertmanager-secret-name", "",
TheoBrigitte marked this conversation as resolved.
Show resolved Hide resolved
"The name of the secret containing the Alertmanager configuration.")
flag.StringVar(&conf.Monitoring.AlertmanagerURL, "alertmanager-url", "",
"The URL of the Alertmanager API.")
flag.StringVar(&conf.Monitoring.MonitoringAgent, "monitoring-agent", commonmonitoring.MonitoringAgentAlloy,
fmt.Sprintf("select monitoring agent to use (%s or %s)", commonmonitoring.MonitoringAgentPrometheus, commonmonitoring.MonitoringAgentAlloy))
flag.BoolVar(&conf.Monitoring.Enabled, "monitoring-enabled", false,
Expand All @@ -109,15 +115,15 @@ func main() {
opts.BindFlags(flag.CommandLine)
flag.Parse()

ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

// Load environment variables.
_, err := env.UnmarshalFromEnviron(&conf.Environment)
if err != nil {
setupLog.Error(err, "failed to unmarshal environment variables")
os.Exit(1)
}

ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

// if the enable-http2 flag is false (the default), http/2 should be disabled
// due to its vulnerabilities. More specifically, disabling http/2 will
// prevent from being vulnerable to the HTTP/2 Stream Cancelation and
Expand Down Expand Up @@ -182,6 +188,13 @@ func main() {
setupLog.Error(err, "unable to setup controller", "controller", "GrafanaOrganizationReconciler")
os.Exit(1)
}

// Setup controller for Alertmanager
err = controller.SetupAlertmanagerReconciler(mgr, conf)
if err != nil {
setupLog.Error(err, "unable to setup controller", "controller", "AlertmanagerReconciler")
os.Exit(1)
}
//+kubebuilder:scaffold:builder

if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
Expand Down
Loading
Loading