Skip to content

Commit

Permalink
Alert user if ceph metadata server is consuming cpu at threshold point.
Browse files Browse the repository at this point in the history
Signed-off-by: Manish <[email protected]>
  • Loading branch information
manishym committed Dec 11, 2023
1 parent 9466153 commit 43d1b07
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 1 deletion.
3 changes: 2 additions & 1 deletion metrics/mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
(import 'services.libsonnet') +
(import 'blocklist.libsonnet') +
(import 'encryption.libsonnet') +
(import 'storage-client.libsonnet')
(import 'storage-client.libsonnet') +
(import 'ceph-overload.libsonnet')
26 changes: 26 additions & 0 deletions metrics/mixin/alerts/perf.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
prometheusAlerts+:: {
groups+: [
{
name: 'ODF-ceph-mds-high-cpu-warnings.rules',
rules: [
{
alert: 'MDS-high-cpu',
expr: |||
pod:container_cpu_usage:sum{%(mdsSelector)s}/ on(pod) kube_pod_resource_limit{resource='cpu',%(mdsSelector)s} > 0.67
||| % $._config,
'for': $._config.mds_cpu_usage_high_threshold_duration,
labels: {
severity: 'warning',
},
annotations: {
message: 'Ceph metadata server pod ({{ $labels.pod }}) has high cpu usage',
description: 'Ceph metadata server pod ({{ $labels.pod }}) has high cpu usage.\nPlease consider increasing the number of active metadata servers,\nit can be done by increasing the number of activeMetadataServers parameter in the StorageCluster CR.',
severity_level: 'warning',
},
},
],
},
],
},
}
2 changes: 2 additions & 0 deletions metrics/mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
odfPoolMirroringImageHealthCriticalAlertTime: '10s',
blockedRBDClientAlertTime: '10s',
ocsStorageClusterKMSConnectionAlert: '5s',
mdsSelector: 'pod=~"rook-ceph-mds.*"',
mds_cpu_usage_high_threshold_duration: '6h',

// Constants
objectStorageType: 'RGW',
Expand Down

0 comments on commit 43d1b07

Please sign in to comment.