Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Frontend splitting metrics check is flaky #6951

Closed
yeya24 opened this issue Dec 2, 2023 · 5 comments · Fixed by #7017
Closed

Query Frontend splitting metrics check is flaky #6951

yeya24 opened this issue Dec 2, 2023 · 5 comments · Fixed by #7017

Comments

@yeya24
Copy link
Contributor

yeya24 commented Dec 2, 2023

What happened:

https://github.com/thanos-io/thanos/actions/runs/7066224273/job/19237709120?pr=6950 Query Frontend E2E tests failed again due to the splitting metrics. It is a very old flaky issue but it is not fixed.

The problem is that the time splitting feature in QFE is based on the absolute time of the timestamp, not based on the query time range. For example, even if a range query is only 2h time range, it can still be split into 2 queries if it is at the boundary of the split interval. If interval is 24h, [yesterday 23PM, today 1AM] range query will become two range queries. This caused our E2E tests to fail.

To fix this, a good way is to not use time.Now() as the timestamp we used to query metrics. By using absolute timestamps, we can control the request parameters and make sure time splitting works as expected.

However, the chanllenge is that we use Prometheus to scrape metrics in E2E tests and the metric timestamp is latest (now). So we have to use latest timestamps in QFE query rather than arbitrary timestamps.

We can probably refactor the QFE E2E tests to not use Prometheus, but use Thanos Receivers instead. Metrics can be sent via remote write requests with predefined timestamps. This way we can use the timestamp we pick to query metrics

@yeya24
Copy link
Contributor Author

yeya24 commented Dec 3, 2023

An alternative is to just remove splitting metrics check in the QFE E2E tests.
We already have it covered in unit tests so should be fine to skip in E2E test

@pawarpranav83
Copy link
Contributor

Hi! I would like to work on this issue.

@MichaHoffmann
Copy link
Contributor

Using receivers is a great idea!

@yeya24
Copy link
Contributor Author

yeya24 commented Dec 25, 2023

Fixed by #6998

@yeya24 yeya24 closed this as completed Dec 25, 2023
@yeya24 yeya24 reopened this Dec 28, 2023
@yeya24
Copy link
Contributor Author

yeya24 commented Dec 28, 2023

It looks like we are still missing two more tests. Let's change all test cases of QFE to use receiver
https://github.com/thanos-io/thanos/actions/runs/7347826051/job/20004877008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants