-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding metrics collection from MTV #916
Conversation
Quality Gate passedIssues Measures |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #916 +/- ##
=======================================
Coverage 15.83% 15.83%
=======================================
Files 109 108 -1
Lines 19970 19911 -59
=======================================
- Hits 3162 3153 -9
+ Misses 16527 16479 -48
+ Partials 281 279 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the cold/warm and remote/local. I guess it's apart from the plans because that was requested.
but how much will it be harmful to have a single metric to contain all of the plans and their data?
did you thought about making percentages out of the data? e.g success rate, etc.
pkg/monitoring/metrics/forklift-controller/migration_metrics.go
Outdated
Show resolved
Hide resolved
It will be more problematic to have it in a single metric because we will have to specify the provider type, status, and destination in a single line, which will provide us with a very specific output. This is the opposite of what we currently want. Additionally, these are only the metrics in the controller. Now, I'm working on recording rules which will have more specific data based on the different labels and counters, so we will end up having a more combined output like you are referring to. Regarding the percentage, I think it can be done. We just need to count all the migrations (there are some statuses we are not counting currently) and see the success/failure percentage from it. |
all my questions regarding combining/adding and even the time between updates are pretty much depending on who consume this data and how. if the consumer expect the data to be accurate 10 sec can fit (maybe someone changed the plan from warm to cold?), but if not we can increase the time in order to consume less resources / increase performance of forklift. |
@bkhizgiy I don't think there is a need for the metric |
I agree with the part of making the labels part of the plans. I'm not sure if everything can be in a single metric since I'm not sure about the difference between the plan and the migration, but it is worth discussing. About the percentages, it's better yo keep as is since you can calculate the percentage in Prometheus and it's against the best practices. |
"sigs.k8s.io/controller-runtime/pkg/client" | ||
) | ||
|
||
var processedPlans = make(map[string]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we act like the plans also in the migrations?
or you didn't do that as you check if we need both migrations and plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently we don't need, because the migration is set with atomic counters that always reflect the status, it set the real value each time instead of increasing it, but the plan is have to many combinations and options now that we have 4 labels on the same metric, so we will need to have all the combinations of Local/Remote.Warm/Cold.Status.ProviderType
which leads to to many counters and if conditions, so this implementation simplifies things.
also I've comment above I'ts the migration entity that corresponds to plan, I left it because the original team had it, but do we really need both? I think one of them should be enough but which one, plans or migrations would be better?
so the question is if we really need it and if we do should we add additional labels there as well?
var succeededRHV, succeededOCP, succeededOVA, succeededVsphere, succeededOpenstack float64 | ||
var failedRHV, failedOCP, failedOVA, failedVsphere, failedOpenstack float64 | ||
// Initialize or reset the counter map at the beginning of each iteration | ||
counterMap := make(map[string]float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fetch the current state and save it. Using the previous implementation, we ignored plans/migrations that switched different statuses and didn't count them. Later, this will be processed in telemetry for what we need. Since there are too many combinations, adding a specific counter for each type based on providerType/migrationType/destination/status, I've added a map that will hold the different combinations as keys and the counter for each as values. This map will be reset each iteration to represent the actual state of the system.
} | ||
|
||
// Set the metrics for duration and data transferred and update the map for scaned migration | ||
if _, exists := processedSucceededMigrations[string(m.UID)]; !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't want to add a new metric for each successful migration each iteration, once we process it once, we will save its ID to a map and ignore it in the next iterations, only for successful migrations.
@bkhizgiy you report the metric as Gauges, but their names end with "total" which is saved for counters. |
@sradco you are right, thanks. I first switched it to a counter but then decided to reflect the current state instead, so I returned it to a Gauge. I will change the name to reflect that. Regarding the linter, it's a good suggestion, I think it can be done next when adding more advanced logic, after we have our basic metrics implementation in. |
Signed-off-by: Bella Khizgiyaev <[email protected]>
Signed-off-by: Bella Khizgiyaev <[email protected]>
Signed-off-by: Bella Khizgiyaev <[email protected]>
Signed-off-by: Bella Khizgiyaev <[email protected]>
Signed-off-by: Bella Khizgiyaev <[email protected]>
…ing to migrations. Signed-off-by: Bella Khizgiyaev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did #916 (comment) changed? if so, please update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just a minor comment inside
dfc99ac
to
9257de6
Compare
@bkhizgiy you can drop the word |
@bkhizgiy I saw you updated the metrics names, please update the pr description with the new names. |
Actually I saw that @bkhizgiy fixed the issue that I raised. I added a few follow ups. |
c4c2db9
to
86bc5c2
Compare
Signed-off-by: Bella Khizgiyaev <[email protected]>
/lgtm |
// 'target' - [Local, Remote] | ||
planStatusGauge = promauto.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "mtv_plans_status", | ||
Help: "VM migration Plans sorted by status, provider, mode and destination", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
destination -> target
// 'target' - [Local, Remote] | ||
migrationStatusGauge = promauto.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "mtv_migrations_status", | ||
Help: "VM Migrations sorted by status, provider, mode and destination", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
destination -> target
Adding Prometheus metrics reporting for migration and plans. allowing to gather additional information about the diffrent MTV plans and migartion per diffrent types of provider.
How to test quickly: Open terminal of the forklift-controller and run following command
curl http://localhost:2112/metrics | grep mtv
Example output: