-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stacked 5/5] metrics: add topology-aware policy metrics collection. #406
Conversation
bf184c5
to
c0a5664
Compare
c0a5664
to
d007d1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3bb93cf
to
4ea6ec6
Compare
adb67fc
to
d83358f
Compare
de0af9d
to
5e04ead
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. There might be some minor nits but I think it is easier if we move with the it now and improve further as follow ups if required. Thank you @klihub for the great work.
5e04ead
to
dccdfdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Huge work.
Such a nice framework. I think we can extend this easily later on to users who wish to get per-container metrics, like how many exclusive CPUs my container has, etc. And metrics overhead can by controlled by enabling/disabling this in configuration. Hats off to @klihub. |
dccdfdd
to
75ea2e5
Compare
Now that you mentioned, I think there's one more thing that we might need to add for computationally expensive metrics. The programmatic ability to check if a collector is enabled (probably simply by name with something like |
bdf663c
to
88b5ee4
Compare
Implement collection of policy 'system' prometheus metrics. We collect per each memory node - memory capcity - memory usage - number of containers sharing the node We collect per each CPU core - allocation from that core - number of containers sharing the core Signed-off-by: Krisztian Litkey <[email protected]>
Add ZoneAvailable to return the amount of available/allocatable memory in a zone, capped by the amount of free memory in any of the ancestors of a zone. Signed-off-by: Krisztian Litkey <[email protected]>
Implement collection of per zone prometheus metrics. Currently we collect for each pool/zone the following - name, cpuset and memset - shared pool capacity, allocation, available amount - memory capacity, allocation, available amount - number of containers - number of containers in the shared pool Signed-off-by: Krisztian Litkey <[email protected]>
88b5ee4
to
009e436
Compare
Notes: This PR is stacked on top of #405.
Implement metrics collection for the topology-aware policy. Currently we collect for each pool/zone
- name, cpuset and memset
- shared pool capacity, allocation, available amount
- memory capacity, allocation, available amount
- number of containers
- number of containers in the shared pool