Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQE: improve performance of subqueries when the parent query is a range query with many steps #9719

Merged
merged 16 commits into from
Nov 11, 2024

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Oct 23, 2024

What this PR does

This PR is a follow-up PR to #9664.

This PR improves the performance of subqueries when the parent query is a range query with many steps.

It does this by changing the behaviour of the ring buffer implementations to search forwards through the buffer to find the end of the range when a subquery is being evaluated, rather than backwards. Searching backwards makes sense for range vector selectors, where there is expected to only be one point, if any, at the end of the buffer outside the range, but for subqueries in range queries it is expected that there will be many points at the end of the buffer outside the range, so it is usually faster to search from the beginning of the buffer.

I'd suggest reviewing each commit separately, as each builds on the previous one and incrementally improves things.

In benchmarking, peak memory consumption is unchanged, and latency is reduced for both subqueries and range vector selectors. For example:

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/streamingpromql/benchmarks
cpu: Apple M1 Pro
                                                                                       │ original.txt │              final.txt              │
                                                                                       │    sec/op    │    sec/op     vs base               │
Query/rate(a_1[1m]),_range_query_with_1000_steps/engine=Mimir-10                         230.2µ ±  1%   218.0µ ±  2%   -5.30% (p=0.002 n=6)
Query/rate(a_100[1m]),_range_query_with_1000_steps/engine=Mimir-10                       7.707m ±  1%   7.086m ±  2%   -8.07% (p=0.002 n=6)
Query/rate(a_2000[1m]),_range_query_with_1000_steps/engine=Mimir-10                      134.3m ±  1%   124.3m ±  1%   -7.43% (p=0.002 n=6)
Query/rate(a_2000[1m]),_range_query_with_10000_steps/engine=Mimir-10                      1.356 ±  6%    1.253 ±  5%   -7.56% (p=0.002 n=6)
Query/rate(nh_1[1m]),_range_query_with_1000_steps/engine=Mimir-10                        705.2µ ±  1%   649.9µ ±  1%   -7.85% (p=0.002 n=6)
Query/rate(nh_100[1m]),_range_query_with_1000_steps/engine=Mimir-10                      44.71m ±  1%   43.46m ±  1%   -2.79% (p=0.002 n=6)
Query/rate(nh_2000[1m]),_range_query_with_1000_steps/engine=Mimir-10                     853.5m ±  1%   843.9m ±  4%        ~ (p=0.310 n=6)
Query/rate(nh_2000[1m]),_range_query_with_10000_steps/engine=Mimir-10                     8.723 ±  3%    8.573 ±  1%   -1.73% (p=0.002 n=6)
Query/sum_over_time(a_2000[10m:3m]),_range_query_with_1000_steps/engine=Mimir-10         172.4m ±  2%   129.6m ±  3%  -24.80% (p=0.002 n=6)
Query/sum_over_time(nh_2000[10m:3m]),_range_query_with_1000_steps/engine=Mimir-10         1.155 ±  2%    1.101 ±  1%   -4.68% (p=0.002 n=6)
Query/sum(sum_over_time(a_100[10m:3m])),_range_query_with_1000_steps/engine=Mimir-10     9.652m ±  1%   7.456m ±  1%  -22.75% (p=0.002 n=6)
Query/sum(sum_over_time(a_2000[10m:3m])),_range_query_with_1000_steps/engine=Mimir-10    177.2m ±  1%   133.6m ±  1%  -24.56% (p=0.002 n=6)

Which issue(s) this PR fixes or relates to

#9664

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [n/a] about-versioning.md updated with experimental features.

@charleskorn charleskorn force-pushed the charleskorn/mqe-subquery-optimisation branch from 6fc4fe2 to de410ca Compare October 25, 2024 05:32
@charleskorn charleskorn force-pushed the charleskorn/mqe-subquery-optimisation branch 2 times, most recently from c4b87bc to 3e3d637 Compare November 10, 2024 22:16
@charleskorn charleskorn force-pushed the charleskorn/mqe-subquery-optimisation branch from 1eb59c8 to 13939cf Compare November 10, 2024 22:19
@charleskorn charleskorn marked this pull request as ready for review November 10, 2024 22:20
@charleskorn charleskorn requested a review from a team as a code owner November 10, 2024 22:20
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, some nice collection of optimisations.

I think we can optimise the bucketed pools somewhat by enforcing the factor to be 2. I can look at doing that as a followup if you prefer though.

pkg/streamingpromql/types/fpoint_ring_buffer.go Outdated Show resolved Hide resolved
count--
// Last returns the last point in this ring buffer view.
// It returns false if the view is empty.
func (v FPointRingBufferView) Last() (promql.FPoint, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit, probably for follow up). I feel like First() and Last() should have the same signature. It seems weird to panic in one situation and return false for another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I'd like to keep this for a separate PR as you suggest.

pkg/streamingpromql/types/ring_buffer_test.go Outdated Show resolved Hide resolved
pkg/streamingpromql/types/fpoint_ring_buffer.go Outdated Show resolved Hide resolved
pkg/streamingpromql/types/hpoint_ring_buffer.go Outdated Show resolved Hide resolved
pkg/streamingpromql/types/ring_buffer_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit on the test, not blocking on it :-).

lgtm 🚀

pkg/streamingpromql/types/ring_buffer_test.go Outdated Show resolved Hide resolved
@charleskorn charleskorn enabled auto-merge (squash) November 11, 2024 05:19
@charleskorn charleskorn merged commit 0941c7b into main Nov 11, 2024
29 checks passed
@charleskorn charleskorn deleted the charleskorn/mqe-subquery-optimisation branch November 11, 2024 05:32
@jhesketh jhesketh mentioned this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants