Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify semantics of system.cpu.time and system.cpu.utilization #647

Closed
tigrannajaryan opened this issue Jan 24, 2022 · 8 comments
Closed

Comments

@tigrannajaryan
Copy link
Member

We have system.cpu.time and system.cpu.utilization semantic conventions which don't explain what the metrics track.

Is system.cpu.time just the cumulative value of system.cpu.utilization or there are other differences? If they track the exact same value with same attributes and the only difference is that one is cumulative and the other is delta do we really need both?

@jmacd
Copy link
Contributor

jmacd commented Jan 24, 2022

My interpretation has been that system.cpu.time is a cumulative number of cpu-seconds used by the process.

We have a general convention for *.utilization, which is equal to *.usage / *.limit. Therefore, I see CPU utilization is usage relative to the limit of available CPU, which should have range [0, 1], so the value can be derived from *.cpu.time if you have timestamped point values and know the number of available CPUs.

The duration could be fixed or it could be variable, logically speaking, depending on temporality.

In a cumulative temporality export, we have to pick a time window independent of the collection interval: do we want a 1-minute utilization, a 5-minute utilization, and so on?

In a delta temporality export, we can set the duration equal to the collection interval, at which point we are describing a cumulative-to-delta translation essentially: we have to remember the prior value of cpu.time and subtract it from the current value, then divide by num_cpu * duration, i.e., it's (new_cpu_time - old_cpu_time) / (num_cpu * duration).

@tigrannajaryan
Copy link
Member Author

I think we need these clarifications in the spec.

@tigrannajaryan
Copy link
Member Author

Either way because utilization is a fraction of usage/limit I believe it makes it different from time semantically, so they are different and the existence of both appears to be warranted. This answers my immediate question.

@dmitryax
Copy link
Member

Based on the spec they supposed to be reported with the same set of attributes: cpu and state. If cpu is set for both (and it should be), limit in the calculations above ^ is always 1. So they are not that different. If system.cpu.time is converted from cumulative to delta, system.cpu.utilization can be easily calculated by dividing the value by reporting duration. Similarly system.cpu.utilization can be easily translated to system.cpu.time deltas.

I agree that they both can exist in the specification, but not sure if it's useful to emit them both. Maybe it should be clarified along with their difference in the spec. In OTel Collector, we have a way to collect optional metrics. I think this is a good use case for that: system.cpu.time seems to be a good metric reported by default, while system.cpu.utilization can be an optional metric that users can add if they want to. Not sure if we have a concept of optional metrics to mention in the spec.

@mx-psi
Copy link
Member

mx-psi commented Jan 18, 2024

Discussed on January 18th System Semantic Conventions WG meeting, we consider this a blocker.

@jsuereth Can you transfer this to the semantic-conventions repository?

@arminru arminru transferred this issue from open-telemetry/opentelemetry-specification Jan 18, 2024
@ChrsMark
Copy link
Member

The system.cpu.utilization description clearly mentions Difference in system.cpu.time since the last measurement, divided by the elapsed time and number of logical CPUs.
Isn't that aligned with #647 (comment) and the rest of the comments here? Shall we additionally add a mention that it can be optional if "backends" can calculate it from system.cpu.time?

@ChrsMark
Copy link
Member

ChrsMark commented Jun 6, 2024

@open-telemetry/semconv-system-approvers is this still valid?
I guess the current description of the system.cpu.utilization description is valid and accurate.

The only missing piece is about making it opt-in since it can be calculated from the system.cpu.time metric.
Otherwise, we can close this issue.

@ChrsMark
Copy link
Member

ChrsMark commented Jun 7, 2024

Filed #1130 for further discussion on the requirement level.
Will close this current issue in a couple of days if there is no objection until then, since the original question should be covered now.

/cc @open-telemetry/specs-semconv-approvers @open-telemetry/specs-semconv-maintainers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

7 participants