Skip to content

Commit

Permalink
export storageconsumer data as metrics and alerts
Browse files Browse the repository at this point in the history
- encode the versions from storageconsumer CR status to comparable numbers
- export these numbers are metrics
- create alerts for version incompatbility and client checkin via heartbeats

Signed-off-by: Leela Venkaiah G <[email protected]>
  • Loading branch information
leelavg committed Oct 31, 2023
1 parent 85c92a0 commit 4d7c089
Show file tree
Hide file tree
Showing 236 changed files with 22,282 additions and 137 deletions.
10 changes: 5 additions & 5 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ require (
github.com/oklog/run v1.1.0
github.com/onsi/ginkgo/v2 v2.11.0
github.com/onsi/gomega v1.27.9
github.com/openshift/api v0.0.0-20230816181854-a7ca92db022a
github.com/openshift/api v0.0.0-20230929221546-73731d374fbf
github.com/openshift/build-machinery-go v0.0.0-20230306181456-d321ffa04533
github.com/openshift/client-go v0.0.0-20230718165156-6014fb98e86a
github.com/openshift/client-go v0.0.0-20231005121823-e81400b97c46
github.com/openshift/custom-resource-status v1.1.2
github.com/operator-framework/api v0.17.7-0.20230626210316-aa3e49803e7b
github.com/operator-framework/operator-lib v0.11.1-0.20230717184314-6efbe3a22f6f
Expand All @@ -38,10 +38,10 @@ require (
google.golang.org/protobuf v1.31.0
gopkg.in/ini.v1 v1.67.0
gotest.tools/v3 v3.5.0
k8s.io/api v0.28.0
k8s.io/api v0.28.2
k8s.io/apiextensions-apiserver v0.28.0
k8s.io/apimachinery v0.28.0
k8s.io/client-go v0.28.0
k8s.io/apimachinery v0.28.2
k8s.io/client-go v0.28.2
k8s.io/klog/v2 v2.100.1
k8s.io/utils v0.0.0-20230726121419-3b25d923346b
open-cluster-management.io/api v0.11.0
Expand Down
20 changes: 10 additions & 10 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -1587,14 +1587,14 @@ github.com/opencontainers/selinux v1.8.2/go.mod h1:MUIHuUEvKB1wtJjQdOyYRgOnLD2xA
github.com/opencontainers/selinux v1.10.0/go.mod h1:2i0OySw99QjzBBQByd1Gr9gSjvuho1lHsJxIJ3gGbJI=
github.com/openlyinc/pointy v1.1.2/go.mod h1:w2Sytx+0FVuMKn37xpXIAyBNhFNBIJGR/v2m7ik1WtM=
github.com/openshift/api v0.0.0-20210105115604-44119421ec6b/go.mod h1:aqU5Cq+kqKKPbDMqxo9FojgDeSpNJI7iuskjXjtojDg=
github.com/openshift/api v0.0.0-20230816181854-a7ca92db022a h1:1bylAza0mFIchRCRPVY9qy62CxJE18fpjEAUSpIA5O4=
github.com/openshift/api v0.0.0-20230816181854-a7ca92db022a/go.mod h1:yimSGmjsI+XF1mr+AKBs2//fSXIOhhetHGbMlBEfXbs=
github.com/openshift/api v0.0.0-20230929221546-73731d374fbf h1:V9YUWOXuMh8JXBbmaEJEGIBl21XLsvGSPBMgYb4rJzE=
github.com/openshift/api v0.0.0-20230929221546-73731d374fbf/go.mod h1:qNtV0315F+f8ld52TLtPvrfivZpdimOzTi3kn9IVbtU=
github.com/openshift/build-machinery-go v0.0.0-20200917070002-f171684f77ab/go.mod h1:b1BuldmJlbA/xYtdZvKi+7j5YGB44qJUJDZ9zwiNCfE=
github.com/openshift/build-machinery-go v0.0.0-20230306181456-d321ffa04533 h1:mh3ZYs7kPIIe3UUY6tJcTExmtjnXXUu0MrBuK2W/Qvw=
github.com/openshift/build-machinery-go v0.0.0-20230306181456-d321ffa04533/go.mod h1:b1BuldmJlbA/xYtdZvKi+7j5YGB44qJUJDZ9zwiNCfE=
github.com/openshift/client-go v0.0.0-20210112165513-ebc401615f47/go.mod h1:u7NRAjtYVAKokiI9LouzTv4mhds8P4S1TwdVAfbjKSk=
github.com/openshift/client-go v0.0.0-20230718165156-6014fb98e86a h1:ZKewwwEIURDnufm2oBd9rRvSp45BtRzPPrsUIFtm4V8=
github.com/openshift/client-go v0.0.0-20230718165156-6014fb98e86a/go.mod h1:EjhPQjEm8HM3GThz5ywNGLEec1P1IjTn08kwzdvupvA=
github.com/openshift/client-go v0.0.0-20231005121823-e81400b97c46 h1:J7UsTNgyM1krYnfsmijowYqt5I4mDM1qxNAy4eEa0xc=
github.com/openshift/client-go v0.0.0-20231005121823-e81400b97c46/go.mod h1:xM64ClnmCheAmffZZdTSJejy3yPE1nTRWQthKaZQ7JY=
github.com/openshift/custom-resource-status v1.1.2 h1:C3DL44LEbvlbItfd8mT5jWrqPfHnSOQoQf/sypqA6A4=
github.com/openshift/custom-resource-status v1.1.2/go.mod h1:DB/Mf2oTeiAmVVX1gN+NEqweonAPY0TKUwADizj8+ZA=
github.com/opentracing/opentracing-go v1.2.1-0.20220228012449-10b1cf09e00b h1:FfH+VrHHk6Lxt9HdVS0PXzSXFyS2NbZKXv33FYPol0A=
Expand Down Expand Up @@ -2791,8 +2791,8 @@ k8s.io/api v0.22.2/go.mod h1:y3ydYpLJAaDI+BbSe2xmGcqxiWHmWjkEeIbiwHvnPR8=
k8s.io/api v0.23.3/go.mod h1:w258XdGyvCmnBj/vGzQMj6kzdufJZVUwEM1U2fRJwSQ=
k8s.io/api v0.23.5/go.mod h1:Na4XuKng8PXJ2JsploYYrivXrINeTaycCGcYgF91Xm8=
k8s.io/api v0.26.0/go.mod h1:k6HDTaIFC8yn1i6pSClSqIwLABIcLV9l5Q4EcngKnQg=
k8s.io/api v0.28.0 h1:3j3VPWmN9tTDI68NETBWlDiA9qOiGJ7sdKeufehBYsM=
k8s.io/api v0.28.0/go.mod h1:0l8NZJzB0i/etuWnIXcwfIv+xnDOhL3lLW919AWYDuY=
k8s.io/api v0.28.2 h1:9mpl5mOb6vXZvqbQmankOfPIGiudghwCoLl1EYfUZbw=
k8s.io/api v0.28.2/go.mod h1:RVnJBsjU8tcMq7C3iaRSGMeaKt2TWEUXcpIt/90fjEg=
k8s.io/apiextensions-apiserver v0.0.0-20190409022649-727a075fdec8/go.mod h1:IxkesAMoaCRoLrPJdZNZUQp9NfZnzqaVzLhb2VEQzXE=
k8s.io/apiextensions-apiserver v0.18.3/go.mod h1:TMsNGs7DYpMXd+8MOCX8KzPOCx8fnZMoIGB24m03+JE=
k8s.io/apiextensions-apiserver v0.20.1/go.mod h1:ntnrZV+6a3dB504qwC5PN/Yg9PBiDNt1EVqbW2kORVk=
Expand All @@ -2812,8 +2812,8 @@ k8s.io/apimachinery v0.22.2/go.mod h1:O3oNtNadZdeOMxHFVxOreoznohCpy0z6mocxbZr7oJ
k8s.io/apimachinery v0.23.3/go.mod h1:BEuFMMBaIbcOqVIJqNZJXGFTP4W6AycEpb5+m/97hrM=
k8s.io/apimachinery v0.23.5/go.mod h1:BEuFMMBaIbcOqVIJqNZJXGFTP4W6AycEpb5+m/97hrM=
k8s.io/apimachinery v0.26.0/go.mod h1:tnPmbONNJ7ByJNz9+n9kMjNP8ON+1qoAIIC70lztu74=
k8s.io/apimachinery v0.28.0 h1:ScHS2AG16UlYWk63r46oU3D5y54T53cVI5mMJwwqFNA=
k8s.io/apimachinery v0.28.0/go.mod h1:X0xh/chESs2hP9koe+SdIAcXWcQ+RM5hy0ZynB+yEvw=
k8s.io/apimachinery v0.28.2 h1:KCOJLrc6gu+wV1BYgwik4AF4vXOlVJPdiqn0yAWWwXQ=
k8s.io/apimachinery v0.28.2/go.mod h1:RdzF87y/ngqk9H4z3EL2Rppv5jj95vGS/HaFXrLDApU=
k8s.io/apiserver v0.18.3/go.mod h1:tHQRmthRPLUtwqsOnJJMoI8SW3lnoReZeE861lH8vUw=
k8s.io/apiserver v0.20.1/go.mod h1:ro5QHeQkgMS7ZGpvf4tSMx6bBOgPfE+f52KwvXfScaU=
k8s.io/apiserver v0.20.4/go.mod h1:Mc80thBKOyy7tbvFtB4kJv1kbdD0eIH8k8vianJcbFM=
Expand All @@ -2830,8 +2830,8 @@ k8s.io/client-go v0.20.4/go.mod h1:LiMv25ND1gLUdBeYxBIwKpkSC5IsozMMmOOeSJboP+k=
k8s.io/client-go v0.20.6/go.mod h1:nNQMnOvEUEsOzRRFIIkdmYOjAZrC8bgq0ExboWSU1I0=
k8s.io/client-go v0.22.2/go.mod h1:sAlhrkVDf50ZHx6z4K0S40wISNTarf1r800F+RlCF6U=
k8s.io/client-go v0.23.5/go.mod h1:flkeinTO1CirYgzMPRWxUCnV0G4Fbu2vLhYCObnt/r4=
k8s.io/client-go v0.28.0 h1:ebcPRDZsCjpj62+cMk1eGNX1QkMdRmQ6lmz5BLoFWeM=
k8s.io/client-go v0.28.0/go.mod h1:0Asy9Xt3U98RypWJmU1ZrRAGKhP6NqDPmptlAzK2kMc=
k8s.io/client-go v0.28.2 h1:DNoYI1vGq0slMBN/SWKMZMw0Rq+0EQW6/AK4v9+3VeY=
k8s.io/client-go v0.28.2/go.mod h1:sMkApowspLuc7omj1FOSUxSoqjr+d5Q0Yc0LOFnYFJY=
k8s.io/code-generator v0.18.3/go.mod h1:TgNEVx9hCyPGpdtCWA34olQYLkh3ok9ar7XfSsr8b6c=
k8s.io/code-generator v0.19.0/go.mod h1:moqLn7w0t9cMs4+5CQyxnfA/HV8MF6aAVENF+WZZhgk=
k8s.io/code-generator v0.19.7/go.mod h1:lwEq3YnLYb/7uVXLorOJfxg+cUu2oihFhHZ0n9NIla0=
Expand Down
14 changes: 7 additions & 7 deletions metrics/deploy/prometheus-ocs-rules-external.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ spec:
groups:
- name: ocs_performance.rules
rules:
- expr: "sum by (namespace, managedBy) (\n topk by (ceph_daemon) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-mgr\"}, \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\")) \n * on(instance, device) group_left topk by (instance,device) \n (1,\n (\n rate(node_disk_read_time_seconds_total[1m]) / (clamp_min(rate(node_disk_reads_completed_total[1m]), 1))\n )\n )\n)\n"
- expr: "sum by (namespace, managedBy) (\n topk by (ceph_daemon) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-exporter\"}, \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\")) \n * on(instance, device) group_left topk by (instance,device) \n (1,\n (\n rate(node_disk_read_time_seconds_total[1m]) / (clamp_min(rate(node_disk_reads_completed_total[1m]), 1))\n )\n )\n)\n"
record: cluster:ceph_disk_latency_read:join_ceph_node_disk_rate1m
- expr: "sum by (namespace, managedBy) (\n topk by (ceph_daemon) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-mgr\"}, \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\")) \n * on(instance, device) group_left topk by (instance,device) \n (1,\n (\n rate(node_disk_write_time_seconds_total[1m]) / (clamp_min(rate(node_disk_writes_completed_total[1m]), 1))\n )\n )\n)\n"
- expr: "sum by (namespace, managedBy) (\n topk by (ceph_daemon) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-exporter\"}, \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\")) \n * on(instance, device) group_left topk by (instance,device) \n (1,\n (\n rate(node_disk_write_time_seconds_total[1m]) / (clamp_min(rate(node_disk_writes_completed_total[1m]), 1))\n )\n )\n)\n"
record: cluster:ceph_disk_latency_write:join_ceph_node_disk_rate1m
- name: ODF_standardized_metrics.rules
rules:
Expand Down Expand Up @@ -46,7 +46,7 @@ spec:
system_type: OCS
system_vendor: Red Hat
record: odf_system_throughput_total_bytes
- expr: "sum by (namespace, managedBy, job, service)\n(\n topk by (ceph_daemon) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-mgr\"}, \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\")) \n * on(instance, device) group_left() topk by (instance,device) \n (1,\n (\n ( \n rate(node_disk_read_time_seconds_total[1m]) / (clamp_min(rate(node_disk_reads_completed_total[1m]), 1))\n ) +\n (\n rate(node_disk_write_time_seconds_total[1m]) / (clamp_min(rate(node_disk_writes_completed_total[1m]), 1))\n )\n )\n )\n)\n"
- expr: "sum by (namespace, managedBy, job, service)\n(\n topk by (ceph_daemon) (1, label_replace(label_replace(ceph_disk_occupation{job=\"rook-ceph-exporter\"}, \"instance\", \"$1\", \"exported_instance\", \"(.*)\"), \"device\", \"$1\", \"device\", \"/dev/(.*)\")) \n * on(instance, device) group_left() topk by (instance,device) \n (1,\n (\n ( \n rate(node_disk_read_time_seconds_total[1m]) / (clamp_min(rate(node_disk_reads_completed_total[1m]), 1))\n ) +\n (\n rate(node_disk_write_time_seconds_total[1m]) / (clamp_min(rate(node_disk_writes_completed_total[1m]), 1))\n )\n )\n )\n)\n"
labels:
system_type: OCS
system_vendor: Red Hat
Expand All @@ -67,7 +67,7 @@ spec:
rules:
- alert: ObcQuotaBytesAlert
annotations:
description: ObjectBucketClaim {{$labels.objectbucketclaim}} has crossed 80% of the size limit set by the quota(bytes) and will become read-only on reaching the quota limit. Increase the quota in the {{$labels.objectbucketclaim}} OBC custom resource.
description: ObjectBucketClaim {{ $labels.objectbucketclaim }} has crossed 80% of the size limit set by the quota(bytes) and will become read-only on reaching the quota limit. Increase the quota in the {{ $labels.objectbucketclaim }} OBC custom resource.
message: OBC has crossed 80% of the quota(bytes).
severity_level: warning
storage_type: RGW
Expand All @@ -78,7 +78,7 @@ spec:
severity: warning
- alert: ObcQuotaObjectsAlert
annotations:
description: ObjectBucketClaim {{$labels.objectbucketclaim}} has crossed 80% of the size limit set by the quota(objects) and will become read-only on reaching the quota limit. Increase the quota in the {{$labels.objectbucketclaim}} OBC custom resource.
description: ObjectBucketClaim {{ $labels.objectbucketclaim }} has crossed 80% of the size limit set by the quota(objects) and will become read-only on reaching the quota limit. Increase the quota in the {{ $labels.objectbucketclaim }} OBC custom resource.
message: OBC has crossed 80% of the quota(object).
severity_level: warning
storage_type: RGW
Expand All @@ -89,7 +89,7 @@ spec:
severity: warning
- alert: ObcQuotaBytesExhausedAlert
annotations:
description: ObjectBucketClaim {{$labels.objectbucketclaim}} has crossed the limit set by the quota(bytes) and will be read-only now. Increase the quota in the {{$labels.objectbucketclaim}} OBC custom resource immediately.
description: ObjectBucketClaim {{ $labels.objectbucketclaim }} has crossed the limit set by the quota(bytes) and will be read-only now. Increase the quota in the {{ $labels.objectbucketclaim }} OBC custom resource immediately.
message: OBC reached quota(bytes) limit.
severity_level: error
storage_type: RGW
Expand All @@ -100,7 +100,7 @@ spec:
severity: critical
- alert: ObcQuotaObjectsExhausedAlert
annotations:
description: ObjectBucketClaim {{$labels.objectbucketclaim}} has crossed the limit set by the quota(objects) and will be read-only now. Increase the quota in the {{$labels.objectbucketclaim}} OBC custom resource immediately.
description: ObjectBucketClaim {{ $labels.objectbucketclaim }} has crossed the limit set by the quota(objects) and will be read-only now. Increase the quota in the {{ $labels.objectbucketclaim }} OBC custom resource immediately.
message: OBC reached quota(object) limit.
severity_level: error
storage_type: RGW
Expand Down
Loading

0 comments on commit 4d7c089

Please sign in to comment.