Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLAT-104961] Upgrade thanos to main and v0.35.0 #26

Merged
merged 67 commits into from
Apr 5, 2024

Conversation

jnyi
Copy link
Collaborator

@jnyi jnyi commented Mar 31, 2024

See https://github.com/databricks/universe/pull/536629

Will keep writer in older version until we figure out thanos-io#7248

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

Sorry, something went wrong.

@jnyi jnyi changed the title [PLAT-104961] Test thanos latest main branch [PLAT-104961][DO NOT MERGE] Test thanos latest main branch Apr 1, 2024
@jnyi jnyi changed the title [PLAT-104961][DO NOT MERGE] Test thanos latest main branch [PLAT-104961] Upgrade thanos to main and v0.35.0 Apr 4, 2024
@jnyi jnyi requested review from hczhu-db and christopherzli April 4, 2024 18:01
jacobbaungard and others added 25 commits April 4, 2024 11:22
Forced tracing was.. Forced true always, even if the checkbox in the UI
to enable tracing was not actually checked.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
Update Prometheus version to include
prometheus/prometheus#13242 which is important
for me - it unblocks further postings work.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
…os-io#7043)

* Make RetryError and HaltError able to be fetched for root cause

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Added unit test

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fix lint

Signed-off-by: Alex Le <leqiyue@amazon.com>

* fixed IsRetryError and IsHaltError functions

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>
* CI: Ensure static react-app is checked in

With this commit the CI system should fail if changes to the react-app
has been made without checking in the changes.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* Add `react-app` as dependency `check-react-app`

To ensure the react-app is rebuilt before checking for changes.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

---------

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
Use the new TSDB flag to disable overlapping compaction to fix OOO
samples handling in the Receive component.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
…hanos-io#6898)

* [wip] First checkpoint

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* [wip] Second checkpoint

All tests passing, unit and e2e.

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Small random refactors

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add some useful trace tags

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Concurrent and traced local writes

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Improve variable names in remote writes

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rename `newFanoutForward` function

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* More refactors

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix linting issue

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add a quorum test with sloppy quorum

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* [wip] Try to make retries work

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* [wip] Checkpoint: wait group still hanging

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Some refactors

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add some commented code so I don't lose it

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Adapt tests

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove sloppy quorum code

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Move some code around

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove even more leftover of sloppy quorum

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Extract a type to hold function params

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove unused struct field

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove useless variable

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove type that wasn't used enough

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Delete function to tighten up max buffered responses

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add comments to some functions

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix peer up check

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix size of replication tracking slices

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rename context

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Don't do local writes concurrently

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove extra error logging

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix syntax after merge

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add missing methods to peersContainer

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix handler test

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reset peers state on hashring changes

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Handle PR comment regarding waitgroup

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Set span tags to help debug

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix concurrency issue

We close the request as soon as quorum is reached and leave a few Go routines running to finish replication and so cleanups.

This means that the context from the HTTP request is cancelled... which ends up also cancelling the pending replication requests.

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix request ID middleware

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix `distributeTimeseriesToReplicas` comment

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Extract var with 1-indexed replication index

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rename methods in peersContainer interface

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Make peerGroup `getConnection` check if peers are up

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove yet one more not useful log

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove logger from `h.sendWrites`

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
1、In the replace of go.mod, due to weaveworks/common#239, The grpc version is 1.45.0, but there are vulnerabilities in this version. In order to fix CVE-2023-44478, the grpc version needs to be upgraded to 1.57.2
2、In order to upgrade GRPC, the version of weaveworks/common also needs to be upgraded, otherwise the build will fail

Signed-off-by: hanyuting8 <hytxidian@163.com>
* Add basic acceptance tests for proxy store
* Fix bug where invalid requests got ignored because of partial response
  strategy

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
* fix lazy postings with zero length

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* unit tests

Signed-off-by: Ben Ye <benye@amazon.com>

* fix doc

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
If the requested label is an external label and we have series matchers
we should only return results if the series matchers actually match a
series.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
…-io#7087)

Receiver hangs waiting for the HTTP Hander to shutdown if an error occurs
before Handler is initialized. This might happen, for example, if the hashring
is too small for a given replication factor.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
* Update prometheus/prometheus

This commit updates prometheus/prometheus to latest main (60b6266e).

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix file discovery

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Fix bug introduced in thanos-io#6898: we
were RLock()ing twice. This leads to a deadlock in some situations.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Kartikay <kartikay_2101ce32@iitp.ac.in>
markPeerUnavailable was always taking a lock and in one case we were
calling it with a lock already taken. Fix this.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Jake Keeys <jake@keeys.org>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
the prometheus helm chart is a community maintained chart since a few
years. With that, the old example pointed to an old chart and the
provided example values aren't also working anymore.

This update the documentation.

Signed-off-by: Mario Constanti <github@constanti.de>
Signed-off-by: Mario Constanti <github@constanti.de>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
)

* Adding new method on bucketed bytes to expose used

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Removing interface, using RWMutex

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
munir131 and others added 21 commits April 4, 2024 11:29
Signed-off-by: Munir Khakhi <munir@improwised.com>
This PR bumps the version of google.golang.org/protobuf to v1.33.0 fix a
potential vulnerability in the protojson.Unmarhsl function [1] that can
occure when unmarshaling a message with a protobuf value.

Even if the function isn't used directly in Thanos it would be safer to
just bump it directly.

[1] https://pkg.go.dev/vuln/GO-2024-2611

Signed-off-by: Daniel Mellado <dmellado@redhat.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
…r logo

fix: add anchor tag to all images
Signed-off-by: Payal17122000 <raviyapayal17@gmail.com>
Do not turn off Ruler if resolving fails. We can still (try to) evaluate
rules even if Alertmanager is not available.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
With this commit we only show the tenant-ui box when enforcement of
tenancy is on, as it is not needed otherwise.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
We have detected a problem in the chunk seriers merger where it will
panic in case it encounters native histogram chunks.
I am using thanos as a library for a project and wanted to use the
penalty function to dedup blocks from Prometheus instances.

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
…guration directories (thanos-io#7199)

Signed-off-by: Daniel Hrabovcak <thespiritxiii@gmail.com>
Signed-off-by: Helia Barroso <helia.barroso@hotmail.com>
Co-authored-by: Helia Barroso <helia.barroso@hotmail.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
* Add support for TSDB selector in querier

This PR allows using the query distributed mode against a set of multi-tenant receivers
as described in https://github.com/thanos-io/thanos/blob/main/docs/proposals-done/202301-distributed-query-execution.md#distributed-execution-against-receive-components.

The feature is enabled by a selector.relabel-config flag in the Query component
which allows it to select a subset of TSDBs to query based on their external labels.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add CHANGELOG entry and fix docs

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix tests

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add comments

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add test case for MatchersForLabelSets

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix failing test

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Use an unbuffered channel

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Change flag description

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Remove parameter from ServerAsClient

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
* Update thanos-io/promql-engine

This commit updates the promql-engine module to latest main and modifies
to remote engine based on the breaking change.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix lint

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
* add username cfg to rueidis client

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
* feat(ui): added BlockSizeStats calculation to blocks page

A block can have a list of contained files set in `.thanos.files`.
If the `files` array is set, all referenced files with `size_bytes` set are counted:
- sum of all `chunk/*` file sizes
- size of index file
- total size (sum of both)

Shows statistics about the selected block in the block details view:
- Total size of block
- Size of index (and percentage of total)
- Size of all chunks (and percentage of total)
- Daily growth, based on total size and block duration

Output is humanized up to Pebibytes and fixed to two decimal places;
raw bytes are accessible through mouse over / title text.

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

* feat(ui): added aggregated BlockSizeStats to blocks row title

Added total size of all blocks from a source to the row title, beneath the source name.

The shown total size is humanized up to pebibytes and fixed to two decimal places;
raw bytes value is accessible through mouse over / title text.

The shown value will refresh with selected compaction levels, but doesn't take block filter into account.

I thought about showing daily growth as well, but just summing all milliseconds of all blocks doesn't work with overlapping blocks / multiple resolutions.

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

* chore(docs): added UI block size PR to CHANGELOG.md

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

* chore(ui): removed comments

Automatic code formatting duplicated some comments near import statements.

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>

---------

Signed-off-by: Markus Möslinger <markus.moeslinger@socra.dev>
…-io#7220)

* fix lazy expanded postings cache and bug of non equal matcher with non existent values

Signed-off-by: Ben Ye <benye@amazon.com>

* test case for remove keys noop

Signed-off-by: Ben Ye <benye@amazon.com>

* add promqlsmith fuzz test

Signed-off-by: Ben Ye <benye@amazon.com>

* update

Signed-off-by: Ben Ye <benye@amazon.com>

* changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* fix go mod

Signed-off-by: Ben Ye <benye@amazon.com>

* rename test

Signed-off-by: Ben Ye <benye@amazon.com>

* fix series request timestamp

Signed-off-by: Ben Ye <benye@amazon.com>

* skip e2e test

Signed-off-by: Ben Ye <benye@amazon.com>

* handle non lazy expanded case

Signed-off-by: Ben Ye <benye@amazon.com>

* update comment

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
* bump Prometheus version to include new label matcher regex value optimization

Signed-off-by: Ben Ye <benye@amazon.com>

* update

Signed-off-by: Ben Ye <benye@amazon.com>

* fix again

Signed-off-by: Ben Ye <benye@amazon.com>

* include latest fix

Signed-off-by: Ben Ye <benye@amazon.com>

* update go mod

Signed-off-by: Ben Ye <benye@amazon.com>

* fix explain test

Signed-off-by: Ben Ye <benye@amazon.com>

* fix test again

Signed-off-by: Ben Ye <benye@amazon.com>

* update again

Signed-off-by: Ben Ye <benye@amazon.com>

* update

Signed-off-by: Ben Ye <benye@amazon.com>

* fix tests so far

Signed-off-by: Ben Ye <benye@amazon.com>

* fix compactor tests

Signed-off-by: Ben Ye <benye@amazon.com>

* use own out of order chunk index

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Alec Rajeev <13004609+alecrajeev@users.noreply.github.com>
@jnyi jnyi force-pushed the pull-latest-main branch from 40fee2c to fa9882c Compare April 4, 2024 18:40
@jnyi jnyi force-pushed the pull-latest-main branch from 127e32d to 2b3c102 Compare April 4, 2024 20:41
Signed-off-by: Yi Jin <yi.jin@databricks.com>
@jnyi jnyi merged commit 995b2b5 into databricks:db_main Apr 5, 2024
12 checks passed
@jnyi jnyi deleted the pull-latest-main branch June 1, 2024 04:19
@jnyi jnyi restored the pull-latest-main branch June 1, 2024 04:19
@jnyi jnyi deleted the pull-latest-main branch June 1, 2024 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet