-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export storageconsumer data as metrics and alerts #2227
export storageconsumer data as metrics and alerts #2227
Conversation
leelavg
commented
Oct 23, 2023
•
edited
Loading
edited
- encode the versions from storageconsumer CR status to comparable numbers
- export these numbers are metrics
- create alerts for version incompatbility and client checkin via heartbeats
Skipping CI for Draft Pull Request. |
only last commit is reviewable for now. |
@aruniiird, @jmolmo could you pls review last commit in this PR? |
for _, storageConsumer := range storageConsumers { | ||
ch <- prometheus.MustNewConstMetric(c.StorageConsumerMetadata, | ||
prometheus.GaugeValue, 1, | ||
storageConsumer.Name, | ||
string(storageConsumer.Status.State)) | ||
|
||
ch <- prometheus.MustNewConstMetric(c.LastHeartbeat, | ||
prometheus.GaugeValue, float64(storageConsumer.Status.LastHeartbeat.Time.Unix()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a heartbeat based on a timestamp would be of type "counter" instead "gauge". Do you expect to have decrements on this metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There'll be no decrements, however _count and _sum derived metrics from a counter type doesn't mean anything in this context and so I used Gauge to represent the instant value with no correlation to previous values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you need to replace the var in the "critical" alert
Please add documentation about the new alerts in the runbooks repo.
Here you have an example for CephClusterCriticallyFull alert
Besides that, Alerts in OpenShift will ship a link to the corresponding runbook in this repository, to make fixing problems even easier. Link should be passed as runbook_url field in alert annotations
See the example
|
ade1461
to
02822e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of unrelated changes in the alert yaml files, please remove it from commit.
Looks good otherwise.
|
02822e3
to
84c4ee3
Compare
I think, everything will be more clear if we merge first: @umangachapagain can you review it and approve if it is ok? |
@jmolmo both the branches differ and for now if they are generated during build time then this PR isn't dependent on any other. |
It seems that the modifications in the rules file are not generated during the build, you need to execute explicitly the command "make gen-latest-prometheus-rules-yamls": |
84c4ee3
to
4d7c089
Compare
7d4fda1
to
9fbb0c7
Compare
4d7c089
to
b25dbbf
Compare
thanks. |
b25dbbf
to
06a407f
Compare
- encode the versions from storageconsumer CR status to comparable numbers - export these numbers are metrics - create alerts for version incompatbility and client checkin via heartbeats Signed-off-by: Leela Venkaiah G <[email protected]>
06a407f
to
86e8069
Compare
resources: | ||
- clusterversions | ||
verbs: | ||
- get | ||
- list | ||
- watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are still not added to the other file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, already added to exporter-role.yaml
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: leelavg, umangachapagain The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c4698e0
into
red-hat-storage:fusion-hci-4.14