Skip to content

Commit

Permalink
Merge branch 'main' into add-mimir-continoustest-alerts
Browse files Browse the repository at this point in the history
  • Loading branch information
QuentinBisson authored Nov 5, 2024
2 parents c375c42 + 44be18c commit d6b4ace
Show file tree
Hide file tree
Showing 79 changed files with 1,070 additions and 785 deletions.
156 changes: 80 additions & 76 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,88 +1,92 @@
version: 2.1
orbs:
architect: giantswarm/architect@5.8.0
architect: giantswarm/architect@5.11.1

workflows:
package-and-push-chart-on-tag:
jobs:
- architect/push-to-app-catalog:
context: "architect"
executor: app-build-suite
name: app-catalog
app_catalog: "control-plane-catalog"
app_catalog_test: "control-plane-test-catalog"
chart: "prometheus-rules"
- architect/push-to-app-catalog:
context: architect
executor: app-build-suite
name: app-catalog
app_catalog: control-plane-catalog
app_catalog_test: control-plane-test-catalog
chart: prometheus-rules
# Trigger job on git tag.
filters:
tags:
only: /^v.*/
filters:
tags:
only: /^v.*/

- architect/push-to-app-collection:
context: "architect"
name: aws-app-collection
app_name: "prometheus-rules"
app_namespace: "monitoring"
app_collection_repo: "aws-app-collection"
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/
branches:
ignore:
- main
- master
- architect/push-to-app-collection:
context: architect
name: aws-app-collection
app_name: prometheus-rules
app_namespace: monitoring
app_collection_repo: aws-app-collection
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/

- architect/push-to-app-collection:
context: architect
name: push-to-capa-app-collection
app_name: "prometheus-rules"
app_namespace: "monitoring"
app_collection_repo: "capa-app-collection"
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/
- architect/push-to-app-collection:
context: architect
name: push-to-capa-app-collection
app_name: prometheus-rules
app_namespace: monitoring
app_collection_repo: capa-app-collection
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/

- architect/push-to-app-collection:
context: architect
name: push-to-capz-app-collection
app_name: "prometheus-rules"
app_namespace: "monitoring"
app_collection_repo: "capz-app-collection"
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/
- architect/push-to-app-collection:
context: architect
name: push-to-capz-app-collection
app_name: prometheus-rules
app_namespace: monitoring
app_collection_repo: capz-app-collection
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/

- architect/push-to-app-collection:
context: architect
name: push-to-cloud-director-app-collection
app_name: "prometheus-rules"
app_namespace: "monitoring"
app_collection_repo: "cloud-director-app-collection"
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/
- architect/push-to-app-collection:
context: architect
name: push-to-cloud-director-app-collection
app_name: prometheus-rules
app_namespace: monitoring
app_collection_repo: cloud-director-app-collection
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/

- architect/push-to-app-collection:
context: "architect"
name: vsphere-app-collection
app_name: "prometheus-rules"
app_namespace: "monitoring"
app_collection_repo: "vsphere-app-collection"
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/
- architect/push-to-app-collection:
context: architect
name: vsphere-app-collection
app_name: prometheus-rules
app_namespace: monitoring
app_collection_repo: vsphere-app-collection
requires:
- app-catalog
filters:
branches:
ignore: /.*/
tags:
only: /^v.*/
132 changes: 130 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,129 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [4.23.0] - 2024-10-30

### Changed

- Rename all `prometheus-agent` related inhibitions to `monitoring-agent` inhibitions.
- Move `Inhibition` from a suffix to a prefix for the prometheus-agent inhibitions to match with the other inhibition alerts:
- `PrometheusAgentFailingInhibition` => `InhibitionPrometheusAgentFailing`
- `PrometheusAgentShardsMissingInhibition` => `InhibitionPrometheusAgentShardsMissing`

### Fixed

- Fixes the statefulset.rules name as it is currently replacing the deployment.rules alerts.

## [4.22.0] - 2024-10-29

### Changed

- Change `KubeletVolumeSpaceTooLow` to only page when there are 500MB or less of space left, letting the node-problem-detector handle the rest.

## [4.21.1] - 2024-10-25

### Fixed

- Updated `aggregation:giantswarm:cluster_release_version` expression to support CAPI clusters

## [4.21.0] - 2024-10-25

### Changed

- Set the `InhibitionControlPlaneUnhealthy` to be valid for all CAPI clusters, not just MCs.

## [4.20.0] - 2024-10-22

### Added

- Added InhibitionClusterWithoutWorkerNodes for CAPA

### Changed

- Modify `KyvernoWebhookHasNoAvailableReplicas` to check specifically for Kyverno resource webhook.
- Inhibit prometheas-agent alerts when a cluster has no worker nodes (AWS vintage only for now)

## [4.19.0] - 2024-10-15

### Added

- Alert `StatefulsetNotSatisfiedAtlas`

### Changed

- Update alloy-app to 0.6.1. This includes:
- an upgrade to upstream version 1.4.2
- a ciliumnetworkpolicy fix for clustering.

## [4.18.0] - 2024-10-08

### Added

- Alerting rule for Loki missing logs at ingestion

## [4.17.0] - 2024-10-03

### Removed

- Remove legacy in-house slo framework.

## [4.16.1] - 2024-09-26

### Fixed

- fix `LokiFailedCompaction` to take latest successfull compaction across multiple compactor/backend pods

## [4.16.0] - 2024-09-26

### Added

- Add `LokiFailedCompaction` alert to know when Loki did not manage to run a successfull compaction in the last 2 hours.

### Removed

- Remove CRsync alerting rules.

### Changed

- Migrate BigMac alerts to Shield
- Upgrade Alloy to 0.5.2 which brings no value to this repo.

### Fixed

- Dashboard links in alertmanager and mimir rules
- Fix cert-manager down alert.
- Remove deprecated app labels for `external-dns` and `ingress-nginx` alerts.
- Remove deprecated app labels for `kube-state-metrics` alerts.
- Fix falco events alerts node label to hostname as node does not exist.
- Fix `MimirHPAReachedMaxReplicas` description to render the horizontalpodautoscaler label.

## [4.15.2] - 2024-09-17

### Fixed

- Update `MimirHPAReachedMaxReplicas` opsrecipe link
- Fix aggregation rule of the `slo:current_burn_rate:ratio` slo.

## [4.15.1] - 2024-09-16

### Removed

- Remove aggregation of slo:period_error_budget_remaining:ratio` as this value can be easily computed and creates a lot of time series in Grafana Cloud

## [4.15.0] - 2024-09-16

### Added

- Add aggregations for slo metrics to export them to grafana cloud
- Add `MimirHPAReachedMaxReplicas` alert, to detect when Mimir's HPAs have reached maximum capacity.
- Add `MimirContinuousTestFailingOnWrites` and `MimirContinuousTestFailingOnReads` alerts.

### Changed

- Added dashboards to several mimir alerts
- Change `IRSAACMCertificateExpiringInLessThan60Days` to
`IRSAACMCertificateExpiringInLessThan45Days`. The ACM certificate is renewed
60 days before expiration and the alert can fire prematurely.

## [4.14.0] - 2024-09-05

Expand Down Expand Up @@ -1364,7 +1479,7 @@ Fix `PromtailRequestsErrors` alerts as promtail retries after some backoff so ac

- Deprecate `role=master` in favor of `role=control-plane`.
- Rename alerts containing `Master` with `ControlPlane`
- Added "cancel_if_prometheus_agent_down" for phoenix alerts ManagementClusterCriticalPodMetricMissing, ManagementClusterDeploymentMissingAWS, WorkloadClusterNonCriticalDeploymentNotSatisfiedKaas
- Added `cancel_if_prometheus_agent_down` for phoenix alerts ManagementClusterCriticalPodMetricMissing, ManagementClusterDeploymentMissingAWS, WorkloadClusterNonCriticalDeploymentNotSatisfiedKaas

## [2.94.0] - 2023-04-26

Expand Down Expand Up @@ -3075,7 +3190,20 @@ Fix `PromtailRequestsErrors` alerts as promtail retries after some backoff so ac

- Add existing rules from https://github.com/giantswarm/prometheus-meta-operator/pull/637/commits/bc6a26759eb955de92b41ed5eb33fa37980660f2

[Unreleased]: https://github.com/giantswarm/prometheus-rules/compare/v4.14.0...HEAD
[Unreleased]: https://github.com/giantswarm/prometheus-rules/compare/v4.23.0...HEAD
[4.23.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.22.0...v4.23.0
[4.22.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.21.1...v4.22.0
[4.21.1]: https://github.com/giantswarm/prometheus-rules/compare/v4.21.0...v4.21.1
[4.21.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.20.0...v4.21.0
[4.20.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.19.0...v4.20.0
[4.19.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.18.0...v4.19.0
[4.18.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.17.0...v4.18.0
[4.17.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.16.1...v4.17.0
[4.16.1]: https://github.com/giantswarm/prometheus-rules/compare/v4.16.0...v4.16.1
[4.16.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.15.2...v4.16.0
[4.15.2]: https://github.com/giantswarm/prometheus-rules/compare/v4.15.1...v4.15.2
[4.15.1]: https://github.com/giantswarm/prometheus-rules/compare/v4.15.0...v4.15.1
[4.15.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.14.0...v4.15.0
[4.14.0]: https://github.com/giantswarm/prometheus-rules/compare/v4.13.3...v4.14.0
[4.13.3]: https://github.com/giantswarm/prometheus-rules/compare/v4.13.2...v4.13.3
[4.13.2]: https://github.com/giantswarm/prometheus-rules/compare/v4.13.1...v4.13.2
Expand Down
4 changes: 2 additions & 2 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
* @giantswarm/team-atlas
/helm/prometheus-rules/templates/kaas/bigmac/ @giantswarm/team-bigmac
/helm/prometheus-rules/templates/kaas/phoenix/ @giantswarm/team-phoenix
/helm/prometheus-rules/templates/kaas/rocket/ @giantswarm/team-rocket
/helm/prometheus-rules/templates/kaas/turtles/ @giantswarm/team-turtles
/helm/prometheus-rules/templates/kaas/turtles/ @giantswarm/team-tenet
/helm/prometheus-rules/templates/kaas/tenet/ @giantswarm/team-tenet
/helm/prometheus-rules/templates/platform/atlas/ @giantswarm/team-atlas
/helm/prometheus-rules/templates/platform/cabbage/ @giantswarm/team-cabbage
/helm/prometheus-rules/templates/platform/honeybadger/ @giantswarm/team-honeybadger
Expand Down
9 changes: 9 additions & 0 deletions Makefile.custom.mk
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,12 @@ pint: install-tools template-chart ## Run pint
pint-all: install-tools template-chart ## Run pint with extra checks
GENERATE_ONLY=true bash test/hack/bin/verify-rules.sh
./test/hack/bin/run-pint.sh test/conf/pint/pint-all.hcl ${PINT_TEAM_FILTER}

##@ Mixins
update-mimir-mixin: install-tools ## Update Mimir mixins
./mimir/update.sh

update-loki-mixin: install-tools ## Update Loki mixins
./loki/update.sh

update-mixin: update-mimir-mixin update-loki-mixin ## Update all mixins
2 changes: 1 addition & 1 deletion helm/prometheus-rules/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ home: https://github.com/giantswarm/prometheus-rules
icon: https://s.giantswarm.io/app-icons/1/png/default-app-light.png
name: prometheus-rules
appVersion: '0.1.0'
version: '4.14.0'
version: '4.23.0'
annotations:
application.giantswarm.io/team: "atlas"
config.giantswarm.io/version: 1.x.x
2 changes: 1 addition & 1 deletion helm/prometheus-rules/templates/alloy-rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@ spec:
namespace: monitoring
# used by renovate
# repo: giantswarm/alloy
version: 0.5.1
version: 0.6.1
{{- end -}}
Loading

0 comments on commit d6b4ace

Please sign in to comment.