Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributor: add bucket count validation to native histograms #7736

Merged
merged 5 commits into from
Apr 2, 2024

Conversation

krajorama
Copy link
Contributor

@krajorama krajorama commented Mar 26, 2024

What this PR does

Otherwise the ingester performs validation and returns an untyped error which we turn into 500x with little detail on the series as well.

Inspired by https://github.com/prometheus/prometheus/blob/25a8d576717f4a69290d6f6755b4a90cfaab08ff/model/histogram/histogram.go#L366 , but not a copy of.

Only checks integer native histograms as the float histograms coming out of PromQL and being written in recording rules have precision issues. E.g.

mimir-1-1        | ts=2024-03-26T17:14:11.049358108Z caller=group.go:529 level=warn
trace_id=0000000000000000509674846c935631 name=cluster_job:cortex_request_duration_seconds:sum_rate
index=0 component=ruler insight=true user=anonymous file=data-ruler/anonymous/krajo group=mimir_api_1
msg="Rule sample appending failed" err="rpc error: code = FailedPrecondition 
desc = native histogram bucket count mismatch, timestamp: 1711473191044, 
series: cluster_job:cortex_request_duration_seconds:sum_rate{job=\"mimir-1\"}, 
expected 0.6889399999999999, got 0.68894 (err-mimir-native-histogram-bucket-count-mismatch)"

Which issue(s) this PR fixes or relates to

Fixes: escalation

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

Otherwise the ingester does it and returns an untyped error which we
turn into 500x with little detail on the series as well.

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama requested a review from a team as a code owner March 26, 2024 16:42
@krajorama krajorama marked this pull request as draft March 26, 2024 16:42
Copy link
Member

@jhalterman jhalterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but let's add a changelog entry.

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama marked this pull request as ready for review March 26, 2024 19:03
@krajorama krajorama requested a review from jdbaldry as a code owner March 26, 2024 19:03
@krajorama krajorama merged commit fb7dbaa into main Apr 2, 2024
29 checks passed
@krajorama krajorama deleted the krajo/native-histograms-extra-validation branch April 2, 2024 08:49
krajorama added a commit that referenced this pull request Apr 2, 2024
krajorama added a commit that referenced this pull request Apr 3, 2024
* Revert "Distributor: add bucket count validation to native histograms (#7736)"

This reverts commit fb7dbaa.

* Handle TSDB native histogram validation errors are soft errors

* add all testcases

Signed-off-by: György Krajcsovits <[email protected]>

* Count towards discarded samples metrics with new reason

Signed-off-by: György Krajcsovits <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
krajorama added a commit that referenced this pull request Apr 3, 2024
* Revert "Distributor: add bucket count validation to native histograms (#7736)"

This reverts commit fb7dbaa.

* Handle TSDB native histogram validation errors are soft errors

* add all testcases

Signed-off-by: György Krajcsovits <[email protected]>

* Count towards discarded samples metrics with new reason

Signed-off-by: György Krajcsovits <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
(cherry picked from commit 51c4088)
krajorama added a commit that referenced this pull request Apr 3, 2024
…#7785)

* Revert "Distributor: add bucket count validation to native histograms (#7736)"

This reverts commit fb7dbaa.

* Handle TSDB native histogram validation errors are soft errors

* add all testcases

Signed-off-by: György Krajcsovits <[email protected]>

* Count towards discarded samples metrics with new reason

Signed-off-by: György Krajcsovits <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
(cherry picked from commit 51c4088)
grafanabot pushed a commit that referenced this pull request Apr 3, 2024
…#7785)

* Revert "Distributor: add bucket count validation to native histograms (#7736)"

This reverts commit fb7dbaa.

* Handle TSDB native histogram validation errors are soft errors

* add all testcases

Signed-off-by: György Krajcsovits <[email protected]>

* Count towards discarded samples metrics with new reason

Signed-off-by: György Krajcsovits <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
(cherry picked from commit 51c4088)
(cherry picked from commit 868d1b3)
krajorama added a commit that referenced this pull request Apr 3, 2024
* Revert "Distributor: add bucket count validation to native histograms (#7736)"

This reverts commit fb7dbaa.

* Handle TSDB native histogram validation errors are soft errors

* add all testcases

Signed-off-by: György Krajcsovits <[email protected]>

* Count towards discarded samples metrics with new reason

Signed-off-by: György Krajcsovits <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
(cherry picked from commit 51c4088)
krajorama added a commit that referenced this pull request Apr 3, 2024
…#7785) (#7786)

* Revert "Distributor: add bucket count validation to native histograms (#7736)"

This reverts commit fb7dbaa.

* Handle TSDB native histogram validation errors are soft errors

* add all testcases

Signed-off-by: György Krajcsovits <[email protected]>

* Count towards discarded samples metrics with new reason

Signed-off-by: György Krajcsovits <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
(cherry picked from commit 51c4088)
(cherry picked from commit 868d1b3)

Co-authored-by: George Krajcsovits <[email protected]>
krajorama added a commit that referenced this pull request Apr 3, 2024
…#7787)

* Revert "Distributor: add bucket count validation to native histograms (#7736)"

This reverts commit fb7dbaa.

* Handle TSDB native histogram validation errors are soft errors

* add all testcases

Signed-off-by: György Krajcsovits <[email protected]>

* Count towards discarded samples metrics with new reason

Signed-off-by: György Krajcsovits <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
(cherry picked from commit 51c4088)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants