Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/deltatocumulative] enhancements for slow-moving ("sparse") counters #36485

Open
sirianni opened this issue Nov 21, 2024 · 3 comments
Labels
needs triage New item requiring triage Sponsor Needed New component seeking sponsor

Comments

@sirianni
Copy link
Contributor

sirianni commented Nov 21, 2024

We are using the deltatocumulative processor in production but are facing issues with slow-moving (or "sparse") counters.

In the current implementation, the processor only emits the cumulative when an upstream delta is received. If a counter has not been incremented for several reporting intervals this means that no datapoint is emitted downstream for those intervening time windows. Contrast this with a true cumulative counter in Prometheus, where an unchanged value will be sampled on each successive scrape and emitted downstream.

Current behavior

Time:       |----|----|----|----|----|----|----|----|----|----|----
Delta:      .    5    .    .    10   .    .    .    7    .    .
Cumulative: .    5              15                  22
Emitted:    .    ●              ●                   ●

This behavior causes issues when trying to use rate() and increase() in PromQL since there is no previous datapoint within the standard 5 minute lookback window to compare with.

This could be addressed by an alternative implementation where the cumulative datapoints were instead flushed periodically from a background thread on a fixed interval. This would have the benefit of continuing to emit cumulative counters that have not been recently incremented, but are not yet stale.

Desired behavior

Time:       |----|----|----|----|----|----|----|----|----|----|----
Delta:      .    5    .    .    10   .    .    .    7    .    .
Cumulative: .    5    5    5    15   15   15   15   22   22   22
Emitted:    ●    ●    ●    ●    ●    ●    ●    ●    ●    ●    ●

Stale timeseries would still be expired much like the current implementation.

There are several ways to incorporate this into the existing codebase

  1. Enhancement to deltatocumulativeprocessor
  2. Enhancement to intervalprocessor
  3. New processor (deltatocumulativeasyncprocessor?)

The implementation can get fairly tricky because you'd need to retain the resource/scope attributes, etc. from the original metric.

Example configuration

deltatocumulative:
  mode: async
  flush_unchanged: true #default
  flush_interval: 1m
  max_stale: 1h

Telemetry data types supported

Metrics

Code Owner(s)

@RichieSams @sh0rez

Sponsor (optional)

@RichieSams @sh0rez

Additional context

CNCF Slack Thread

@sirianni sirianni added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Nov 21, 2024
@sirianni sirianni changed the title New component: Delta to cumulative enhancements for slow-moving ("sparse") counters New component: delta to cumulative enhancements for slow-moving ("sparse") counters Nov 21, 2024
@sh0rez
Copy link
Member

sh0rez commented Nov 21, 2024

hi! thanks for opening this issue!

I have a lot of ideas for this and will write them down when I find the time.

In the meantime, wdyt about removing the "new component" from this? I'm fairly sure we can find a place within deltatocumulative / interval for the needed functionality, especially because it sounds rather common and likely happens for a lot of users.

New component issues are afaict concrete proposals and we are not quite there yet :)
Let's have a general disussion and see where we get

@sirianni sirianni changed the title New component: delta to cumulative enhancements for slow-moving ("sparse") counters [processor/deltatocumulative] enhancements for slow-moving ("sparse") counters Nov 21, 2024
@sirianni
Copy link
Contributor Author

sirianni commented Nov 21, 2024

Done. I agree, but created it that way since it was suggested to use the "proposal" template. Can you please update the labels as needed? I don't have access to add/remove labels.

@RichieSams
Copy link
Contributor

My vote would be to extend intervalprocessor. We could have two modes:

  1. accumulate and flush at the interval (this is the current behavior)
  2. accumulate and re-export at the interval (expiring un-updated data after X time period)

For 1 (the current behavior), we accumulate datapoints over the interval, replacing older matching datapoints with newer ones. Then at the interval time, we send all our state downstream, and clear our state to empty. Repeat.

For 2, we again would accumulate datapoints over the interval, replacing older matching datapoints with newer ones. At the interval time, we again would send all our state downstream, but we wouldn't clear our state to empty. IE, we continue re-exporting our state at each interval time. However, this risks OOM, so we would also need a mechanic to remove datapoints that haven't been updated in X time. For example, if a data series hasn't been updated in an hour, remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage New item requiring triage Sponsor Needed New component seeking sponsor
Projects
None yet
Development

No branches or pull requests

3 participants