Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querytee: Add metric to measure relative backend latency #7782

Merged
merged 10 commits into from
Apr 29, 2024

Conversation

jhesketh
Copy link
Contributor

@jhesketh jhesketh commented Apr 3, 2024

This metric gives us a measurement of how individual queries compare between two backends in terms of latency (or duration).

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

This metric gives us a measurement of how individual queries compare
between two backends in terms of latency (or duration).
@jhesketh jhesketh requested a review from a team as a code owner April 3, 2024 02:53
@jhesketh jhesketh marked this pull request as draft April 3, 2024 02:53
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

It'd be good to add the proportional latency difference as well - a request being 1s slower is interesting, but knowing whether that's a 1% change or a 100% change would be useful as well.

tools/querytee/proxy_metrics.go Outdated Show resolved Hide resolved
tools/querytee/proxy_endpoint.go Outdated Show resolved Hide resolved
tools/querytee/proxy_endpoint.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric gives us a measurement of how individual queries compare between two backends in terms of latency (or duration).

Why do we need a new metric? Can't we just solve this "problem" at PromQL level, with a query?

@jhesketh
Copy link
Contributor Author

jhesketh commented Apr 3, 2024

Why do we need a new metric? Can't we just solve this "problem" at PromQL level, with a query?

We currently measure the backend_request_duration_seconds as a histogram with a label for the backend. We can subtract one backend from the other to get a general indication of which one is more performant, but it isn't necessarily accurate as results can cancel each other out.

By tracking with a new metric, we can see on this individual histogram the number of times one backend is faster than the other (and by how much) depending if the bucket is positive or negative.

Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, modulo comment on test changes.

Could you please add a changelog entry?

tools/querytee/proxy_endpoint_test.go Outdated Show resolved Hide resolved
tools/querytee/proxy_metrics.go Outdated Show resolved Hide resolved
tools/querytee/proxy_endpoint_test.go Outdated Show resolved Hide resolved
@jhesketh jhesketh marked this pull request as ready for review April 9, 2024 05:15
@jhesketh jhesketh requested a review from jdbaldry as a code owner April 9, 2024 05:15
@jhesketh jhesketh enabled auto-merge (squash) April 20, 2024 19:46
Copy link
Member

@jdbaldry jdbaldry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the metric name.

I think consistency with other metric descriptions is probably preferable to more clarity.

}
}

func filterMetrics(metrics []*dto.MetricFamily, names []string) []*dto.MetricFamily {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] This method is only ever used with one value for name - might be able to simplify this and the test that calls it.

@jhesketh jhesketh merged commit 5660319 into grafana:main Apr 29, 2024
29 checks passed
@jhesketh jhesketh deleted the jhesketh/querytee branch April 29, 2024 05:55
@charleskorn charleskorn mentioned this pull request Jun 11, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants