Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RHSTOR-5075] remove StorageClientIncompatibleOperatorVersion alert at Critical level #2318

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion metrics/deploy/prometheus-ocs-rules-external.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@ spec:
message: Storage Cluster KMS Server is in un-connected state. Please check KMS config.
severity_level: error
storage_type: ceph
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/KMSServerConnectionAlert.md
expr: |
ocs_storagecluster_kms_connection_status{job="ocs-metrics-exporter"} == 1
for: 5s
Expand Down
19 changes: 4 additions & 15 deletions metrics/deploy/prometheus-ocs-rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ spec:
annotations:
description: Mirror daemon is in unhealthy status for more than 1m. Mirroring on this cluster is not working as expected.
message: Mirror daemon is unhealthy.
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/OdfMirrorDaemonStatus.md
severity_level: error
storage_type: ceph
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/OdfMirrorDaemonStatus.md
expr: |
((count by(namespace) (ocs_mirror_daemon_count{job="ocs-metrics-exporter"} == 0)) * on(namespace) group_left() (count by(namespace) (ocs_pool_mirroring_status{job="ocs-metrics-exporter"} == 1))) > 0
for: 1m
Expand All @@ -81,9 +81,9 @@ spec:
annotations:
description: Mirroring image(s) (PV) in the pool {{ $labels.name }} are in Unknown state for more than 1m. Mirroring might not work as expected.
message: Mirroring image(s) (PV) in the pool {{ $labels.name }} are in Unknown state.
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/OdfPoolMirroringImageHealth.md
severity_level: warning
storage_type: ceph
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/OdfPoolMirroringImageHealth.md
expr: |
(ocs_pool_mirroring_image_health{job="ocs-metrics-exporter"} * on (namespace) group_left() (max by(namespace) (ocs_pool_mirroring_status{job="ocs-metrics-exporter"}))) == 1
for: 1m
Expand All @@ -96,7 +96,6 @@ spec:
message: Mirroring image(s) (PV) in the pool {{ $labels.name }} are in Warning state.
severity_level: warning
storage_type: ceph
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/OdfPoolMirroringImageHealth.md
expr: |
(ocs_pool_mirroring_image_health{job="ocs-metrics-exporter"} * on (namespace) group_left() (max by(namespace) (ocs_pool_mirroring_status{job="ocs-metrics-exporter"}))) == 2
for: 1m
Expand All @@ -109,7 +108,6 @@ spec:
message: Mirroring image(s) (PV) in the pool {{ $labels.name }} are in Error state.
severity_level: error
storage_type: ceph
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/OdfPoolMirroringImageHealth.md
expr: |
(ocs_pool_mirroring_image_health{job="ocs-metrics-exporter"} * on (namespace) group_left() (max by(namespace) (ocs_pool_mirroring_status{job="ocs-metrics-exporter"}))) == 3
for: 10s
Expand Down Expand Up @@ -205,8 +203,8 @@ spec:
annotations:
description: An RBD client might be blocked by Ceph on node {{ $labels.node_name }}. This alert is triggered when the ocs_rbd_client_blocklisted metric reports a value of 1 for the node and there are pods in a CreateContainerError state on the node. This may cause the filesystem for the PVCs to be in a read-only state. Please check the pod description for more details.
message: An RBD client might be blocked by Ceph on node {{ $labels.node_name }}.
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/ODFRBDClientBlocked.md
severity_level: error
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/ODFRBDClientBlocked.md'
expr: |
(
ocs_rbd_client_blocklisted{node=~".+"} == 1
Expand All @@ -225,9 +223,9 @@ spec:
annotations:
description: Storage Cluster KMS Server is in un-connected state for more than 5s. Please check KMS config.
message: Storage Cluster KMS Server is in un-connected state. Please check KMS config.
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/KMSServerConnectionAlert.md
severity_level: error
storage_type: ceph
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/KMSServerConnectionAlert.md
expr: |
ocs_storagecluster_kms_connection_status{job="ocs-metrics-exporter"} == 1
for: 5s
Expand Down Expand Up @@ -262,15 +260,6 @@ spec:
floor((ocs_storage_provider_operator_version>0)/1000) - ignoring(storage_consumer_name) group_right() floor((ocs_storage_client_operator_version>0)/1000) == 1
labels:
severity: warning
- alert: StorageClientIncompatibleOperatorVersion
annotations:
description: Storage Client Operator ({{ $labels.storage_consumer_name }}) differs by more than 1 minor version. Client configuration may be incompatible and unsupported
message: Storage Client Operator ({{ $labels.storage_consumer_name }}) differs by more than 1 minor version
severity_level: critical
expr: |
floor((ocs_storage_provider_operator_version>0)/1000) - ignoring(storage_consumer_name) group_right() floor((ocs_storage_client_operator_version>0)/1000) > 1 or floor((ocs_storage_client_operator_version>0)/1000) - ignoring(storage_consumer_name) group_left() floor((ocs_storage_provider_operator_version>0)/1000) >= 1
labels:
severity: critical
- name: ceph-daemon-performance-alerts.rules
rules:
- alert: MDSCacheUsageHigh
Expand Down
17 changes: 0 additions & 17 deletions metrics/mixin/alerts/storage-client.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -48,23 +48,6 @@
severity_level: 'warning',
},
},
{
# divide by 1000 here removes patch version
# critical if client lags provider by more than one minor version or
# client is ahead of provider
alert: 'StorageClientIncompatibleOperatorVersion',
expr: |||
floor((ocs_storage_provider_operator_version>0)/1000) - ignoring(storage_consumer_name) group_right() floor((ocs_storage_client_operator_version>0)/1000) > %(clientOperatorMinorVerDiff)d or floor((ocs_storage_client_operator_version>0)/1000) - ignoring(storage_consumer_name) group_left() floor((ocs_storage_provider_operator_version>0)/1000) >= %(clientOperatorMinorVerDiff)d
||| % $._config,
labels: {
severity: 'critical',
},
annotations: {
message: 'Storage Client Operator ({{ $labels.storage_consumer_name }}) differs by more than %d minor version' % $._config.clientOperatorMinorVerDiff,
description: 'Storage Client Operator ({{ $labels.storage_consumer_name }}) differs by more than %d minor version. Client configuration may be incompatible and unsupported' % $._config.clientOperatorMinorVerDiff,
severity_level: 'critical',
},
},
],
},
],
Expand Down