Skip to content

Commit

Permalink
[PLAT-104961] Upgrade thanos to main and v0.35.0 (#26)
Browse files Browse the repository at this point in the history
  • Loading branch information
jnyi authored Apr 5, 2024
2 parents 7175992 + 755401a commit 995b2b5
Show file tree
Hide file tree
Showing 169 changed files with 6,238 additions and 3,279 deletions.
6 changes: 3 additions & 3 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
make install-tool-deps
- go/save-cache
- setup_remote_docker:
version: 20.10.12
version: docker24
- run:
name: Create Secret if PR is not forked
# GCS integration tests are run only for author's PR that have write access, because these tests
Expand Down Expand Up @@ -82,7 +82,7 @@ jobs:
- git-shallow-clone/checkout
- go/mod-download-cached
- setup_remote_docker:
version: 20.10.12
version: docker24
- attach_workspace:
at: .
# Register qemu to support multi-arch.
Expand All @@ -104,7 +104,7 @@ jobs:
- git-shallow-clone/checkout
- go/mod-download-cached
- setup_remote_docker:
version: 20.10.12
version: docker24
- attach_workspace:
at: .
- run: make tarballs-release
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/react.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ jobs:
restore-keys: |
${{ runner.os }}-node-
- run: CI=false make check-react-app
- run: make react-app-test
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,29 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

### Fixed

- [#7083](https://github.com/thanos-io/thanos/pull/7083) Store Gateway: Fix lazy expanded postings with 0 length failed to be cached.
- [#7080](https://github.com/thanos-io/thanos/pull/7080) Receive: race condition in handler Close() when stopped early
- [#7132](https://github.com/thanos-io/thanos/pull/7132) Documentation: fix broken helm installation instruction
- [#7134](https://github.com/thanos-io/thanos/pull/7134) Store, Compact: Revert the recursive block listing mechanism introduced in https://github.com/thanos-io/thanos/pull/6474 and use the same strategy as in 0.31. Introduce a `--block-discovery-strategy` flag to control the listing strategy so that a recursive lister can still be used if the tradeoff of slower but cheaper discovery is preferred.
- [#7122](https://github.com/thanos-io/thanos/pull/7122) Store Gateway: Fix lazy expanded postings estimate base cardinality using posting group with remove keys.
- [#7224](https://github.com/thanos-io/thanos/pull/7224) Query-frontend: Add Redis username to the client configuration.
- [#7220](https://github.com/thanos-io/thanos/pull/7220) Store Gateway: Fix lazy expanded postings caching partial expanded postings and bug of estimating remove postings with non existent value. Added `PromQLSmith` based fuzz test to improve correctness.

### Added

- [#7194](https://github.com/thanos-io/thanos/pull/7194) Downsample: retry objstore related errors
- [#7105](https://github.com/thanos-io/thanos/pull/7105) Rule: add flag `--query.enable-x-functions` to allow usage of extended promql functions (xrate, xincrease, xdelta) in loaded rules
- [#6867](https://github.com/thanos-io/thanos/pull/6867) Query UI: Tenant input box added to the Query UI, in order to be able to specify which tenant the query should use.
- [#7175](https://github.com/thanos-io/thanos/pull/7175): Query: Add `--query.mode=distributed` which enables the new distributed mode of the Thanos query engine.
- [#7199](https://github.com/thanos-io/thanos/pull/7199): Reloader: Add support for watching and decompressing Prometheus configuration directories
- [#7200](https://github.com/thanos-io/thanos/pull/7175): Query: Add `--selector.relabel-config` and `--selector.relabel-config-file` flags which allows scoping the Querier to a subset of matched TSDBs.
- [#7233](https://github.com/thanos-io/thanos/pull/7233): UI: Showing Block Size Stats

### Changed

- [#7123](https://github.com/thanos-io/thanos/pull/7123) Rule: Change default Alertmanager API version to v2.
- [#7223](https://github.com/thanos-io/thanos/pull/7223) Automatic detection of memory limits and configure GOMEMLIMIT to match.

### Removed

## [v0.34.1](https://github.com/thanos-io/thanos/tree/release-0.34) - 11.02.24
Expand All @@ -38,6 +57,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6874](https://github.com/thanos-io/thanos/pull/6874) Sidecar: fix labels returned by 'api/v1/series' in presence of conflicting external and inner labels.
- [#7009](https://github.com/thanos-io/thanos/pull/7009) Rule: Fix spacing error in URL.
- [#7082](https://github.com/thanos-io/thanos/pull/7082) Stores: fix label values edge case when requesting external label values with matchers
- [#7114](https://github.com/thanos-io/thanos/pull/7114) Stores: fix file path bug for minio v7.0.61

### Added

Expand All @@ -53,13 +73,16 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6887](https://github.com/thanos-io/thanos/pull/6887) Query Frontend: *breaking :warning:* Add tenant label to relevant exported metrics. Note that this change may cause some pre-existing custom dashboard queries to be incorrect due to the added label.
- [#7028](https://github.com/thanos-io/thanos/pull/7028) Query|Query Frontend: Add new `--query-frontend.enable-x-functions` flag to enable experimental extended functions.
- [#6884](https://github.com/thanos-io/thanos/pull/6884) Tools: Add upload-block command to upload blocks to object storage.
- [#7010](https://github.com/thanos-io/thanos/pull/7010) Cache: Added `set_async_circuit_breaker_*` to utilize the circuit breaker pattern for dynamically thresholding asynchronous set operations.

### Changed

- [#6539](https://github.com/thanos-io/thanos/pull/6539) Store: *breaking :warning:* Changed `--sync-block-duration` default 3m to 15m.

### Removed

- [#7014](https://github.com/thanos-io/thanos/pull/7014) *: *breaking :warning:* Removed experimental query pushdown feature to simplify query path. This feature has had high complexity for too little benefits. The responsibility for query pushdown will be moved to the distributed mode of the new 'thanos' promql engine.

## [v0.33.0](https://github.com/thanos-io/thanos/tree/release-0.33) - 18.12.2023

### Fixed
Expand Down
3 changes: 1 addition & 2 deletions Dockerfile.multi-stage
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@ COPY . $GOPATH/src/github.com/thanos-io/thanos
RUN git update-index --refresh; make build

# -----------------------------------------------------------------------------
FROM alpine:3.15

#FROM quay.io/prometheus/busybox@sha256:${BASE_DOCKER_SHA}
FROM quay.io/prometheus/busybox@sha256:${BASE_DOCKER_SHA}
LABEL maintainer="The Thanos Authors"

COPY --from=builder /go/bin/thanos /bin/thanos
Expand Down
2 changes: 1 addition & 1 deletion MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
| Bartłomiej Płotka | [email protected] | `@bwplotka` | [@bwplotka](https://github.com/bwplotka) | Google |
| Frederic Branczyk | [email protected] | `@brancz` | [@brancz](https://github.com/brancz) | Polar Signals |
| Giedrius Statkevičius | [email protected] | `@Giedrius Statkevičius` | [@GiedriusS](https://github.com/GiedriusS) | Vinted |
| Kemal Akkoyun | [email protected] | `@kakkoyun` | [@kakkoyun](https://github.com/kakkoyun) | Polar Signals |
| Kemal Akkoyun | [email protected] | `@kakkoyun` | [@kakkoyun](https://github.com/kakkoyun) | Fal |
| Lucas Servén Marín | [email protected] | `@squat` | [@squat](https://github.com/squat) | Red Hat |
| Prem Saraswat | [email protected] | `@Prem Saraswat` | [@onprem](https://github.com/onprem) | Red Hat |
| Matthias Loibl | [email protected] | `@metalmatze` | [@metalmatze](https://github.com/metalmatze) | Polar Signals |
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,10 @@ $(REACT_APP_OUTPUT_DIR): $(REACT_APP_NODE_MODULES_PATH) $(REACT_APP_SOURCE_FILES
.PHONY: react-app
react-app: $(REACT_APP_OUTPUT_DIR)

.PHONY: check-react-app
check-react-app: react-app
$(call require_clean_work_tree,'all generated files should be committed, run make react-app and commit changes.')

.PHONY: react-app-lint
react-app-lint: $(REACT_APP_NODE_MODULES_PATH)
@echo ">> running React app linting"
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.34.1
0.35.0-dev
16 changes: 14 additions & 2 deletions cmd/thanos/compact.go
Original file line number Diff line number Diff line change
Expand Up @@ -239,8 +239,16 @@ func runCompact(
consistencyDelayMetaFilter := block.NewConsistencyDelayMetaFilter(logger, conf.consistencyDelay, extprom.WrapRegistererWithPrefix("thanos_", reg))
timePartitionMetaFilter := block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime)

baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseMetaFetcher, err := block.NewBaseFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, baseBlockIDsFetcher, conf.dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg))
var blockLister block.Lister
switch syncStrategy(conf.blockListStrategy) {
case concurrentDiscovery:
blockLister = block.NewConcurrentLister(logger, insBkt)
case recursiveDiscovery:
blockLister = block.NewRecursiveLister(logger, insBkt)
default:
return errors.Errorf("unknown sync strategy %s", conf.blockListStrategy)
}
baseMetaFetcher, err := block.NewBaseFetcher(logger, conf.blockMetaFetchConcurrency, insBkt, blockLister, conf.dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg))
if err != nil {
return errors.Wrap(err, "create meta fetcher")
}
Expand Down Expand Up @@ -695,6 +703,7 @@ type compactConfig struct {
wait bool
waitInterval time.Duration
disableDownsampling bool
blockListStrategy string
blockMetaFetchConcurrency int
blockFilesConcurrency int
blockViewerSyncBlockInterval time.Duration
Expand Down Expand Up @@ -757,6 +766,9 @@ func (cc *compactConfig) registerFlag(cmd extkingpin.FlagClause) {
"as querying long time ranges without non-downsampled data is not efficient and useful e.g it is not possible to render all samples for a human eye anyway").
Default("false").BoolVar(&cc.disableDownsampling)

strategies := strings.Join([]string{string(concurrentDiscovery), string(recursiveDiscovery)}, ", ")
cmd.Flag("block-discovery-strategy", "One of "+strategies+". When set to concurrent, stores will concurrently issue one call per directory to discover active blocks in the bucket. The recursive strategy iterates through all objects in the bucket, recursively traversing into each directory. This avoids N+1 calls at the expense of having slower bucket iterations.").
Default(string(concurrentDiscovery)).StringVar(&cc.blockListStrategy)
cmd.Flag("block-meta-fetch-concurrency", "Number of goroutines to use when fetching block metadata from object storage.").
Default("32").IntVar(&cc.blockMetaFetchConcurrency)
cmd.Flag("block-files-concurrency", "Number of goroutines to use when fetching/uploading block files from object storage.").
Expand Down
40 changes: 40 additions & 0 deletions cmd/thanos/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
"strings"
"time"

"github.com/KimMachineGun/automemlimit/memlimit"
extflag "github.com/efficientgo/tools/extkingpin"
"github.com/pkg/errors"

Expand Down Expand Up @@ -283,3 +284,42 @@ func parseFlagLabels(s []string) (labels.Labels, error) {
sort.Sort(lset)
return lset, nil
}

type goMemLimitConfig struct {
enableAutoGoMemlimit bool
memlimitRatio float64
}

func (gml *goMemLimitConfig) registerFlag(cmd extkingpin.FlagClause) *goMemLimitConfig {
cmd.Flag("enable-auto-gomemlimit",
"Enable go runtime to automatically limit memory consumption.").
Default("false").BoolVar(&gml.enableAutoGoMemlimit)

cmd.Flag("auto-gomemlimit.ratio",
"The ratio of reserved GOMEMLIMIT memory to the detected maximum container or system memory.").
Default("0.9").FloatVar(&gml.memlimitRatio)

return gml
}

func configureGoAutoMemLimit(common goMemLimitConfig) error {
if common.memlimitRatio <= 0.0 || common.memlimitRatio > 1.0 {
return errors.New("--auto-gomemlimit.ratio must be greater than 0 and less than or equal to 1.")
}

if common.enableAutoGoMemlimit {
if _, err := memlimit.SetGoMemLimitWithOpts(
memlimit.WithRatio(common.memlimitRatio),
memlimit.WithProvider(
memlimit.ApplyFallback(
memlimit.FromCgroup,
memlimit.FromSystem,
),
),
); err != nil {
return errors.Wrap(err, "Failed to set GOMEMLIMIT automatically")
}
}

return nil
}
7 changes: 4 additions & 3 deletions cmd/thanos/downsample.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/chunkenc"
"github.com/thanos-io/thanos/pkg/compact"

"github.com/thanos-io/objstore"
"github.com/thanos-io/objstore/client"
Expand Down Expand Up @@ -90,7 +91,7 @@ func RunDownsample(
insBkt := objstoretracing.WrapWithTraces(objstore.WrapWithMetrics(bkt, extprom.WrapRegistererWithPrefix("thanos_", reg), bkt.Name()))

// While fetching blocks, filter out blocks that were marked for no downsample.
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, insBkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, insBkt)
metaFetcher, err := block.NewMetaFetcher(logger, block.FetcherConcurrency, insBkt, baseBlockIDsFetcher, "", extprom.WrapRegistererWithPrefix("thanos_", reg), []block.MetadataFilter{
block.NewDeduplicateFilter(block.FetcherConcurrency),
downsample.NewGatherNoDownsampleMarkFilter(logger, insBkt, block.FetcherConcurrency),
Expand Down Expand Up @@ -358,7 +359,7 @@ func processDownsampling(

err := block.Download(ctx, logger, bkt, m.ULID, bdir, objstore.WithFetchConcurrency(blockFilesConcurrency))
if err != nil {
return errors.Wrapf(err, "download block %s", m.ULID)
return compact.NewRetryError(errors.Wrapf(err, "download block %s", m.ULID))
}
level.Info(logger).Log("msg", "downloaded block", "id", m.ULID, "duration", time.Since(begin), "duration_ms", time.Since(begin).Milliseconds())

Expand Down Expand Up @@ -419,7 +420,7 @@ func processDownsampling(

err = block.Upload(ctx, logger, bkt, resdir, hashFunc)
if err != nil {
return errors.Wrapf(err, "upload downsampled block %s", id)
return compact.NewRetryError(errors.Wrapf(err, "upload downsampled block %s", id))
}

level.Info(logger).Log("msg", "uploaded block", "id", id, "duration", time.Since(begin), "duration_ms", time.Since(begin).Milliseconds())
Expand Down
12 changes: 11 additions & 1 deletion cmd/thanos/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/collectors"
versioncollector "github.com/prometheus/client_golang/prometheus/collectors/version"
"github.com/prometheus/common/version"
"go.uber.org/automaxprocs/maxprocs"
"gopkg.in/alecthomas/kingpin.v2"
Expand Down Expand Up @@ -49,6 +50,10 @@ func main() {
Default(logging.LogFormatLogfmt).Enum(logging.LogFormatLogfmt, logging.LogFormatJSON)
tracingConfig := extkingpin.RegisterCommonTracingFlags(app)

goMemLimitConf := goMemLimitConfig{}

goMemLimitConf.registerFlag(app)

registerSidecar(app)
registerStore(app)
registerQuery(app)
Expand All @@ -61,6 +66,11 @@ func main() {
cmd, setup := app.Parse()
logger := logging.NewLogger(*logLevel, *logFormat, *debugName)

if err := configureGoAutoMemLimit(goMemLimitConf); err != nil {
level.Error(logger).Log("msg", "failed to configure Go runtime memory limits", "err", err)
os.Exit(1)
}

// Running in container with limits but with empty/wrong value of GOMAXPROCS env var could lead to throttling by cpu
// maxprocs will automate adjustment by using cgroups info about cpu limit if it set as value for runtime.GOMAXPROCS.
undo, err := maxprocs.Set(maxprocs.Logger(func(template string, args ...interface{}) {
Expand All @@ -73,7 +83,7 @@ func main() {

metrics := prometheus.NewRegistry()
metrics.MustRegister(
version.NewCollector("thanos"),
versioncollector.NewCollector("thanos"),
collectors.NewGoCollector(
collectors.WithGoCollectorRuntimeMetrics(collectors.GoRuntimeMetricsRule{Matcher: regexp.MustCompile("/.*")}),
),
Expand Down
5 changes: 3 additions & 2 deletions cmd/thanos/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"github.com/thanos-io/objstore"

"github.com/efficientgo/core/testutil"

"github.com/thanos-io/thanos/pkg/block"
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/compact/downsample"
Expand Down Expand Up @@ -157,7 +158,7 @@ func TestRegression4960_Deadlock(t *testing.T) {

metrics := newDownsampleMetrics(prometheus.NewRegistry())
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, bkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
testutil.Ok(t, err)

Expand Down Expand Up @@ -197,7 +198,7 @@ func TestCleanupDownsampleCacheFolder(t *testing.T) {

metrics := newDownsampleMetrics(prometheus.NewRegistry())
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
baseBlockIDsFetcher := block.NewBaseBlockIDsFetcher(logger, bkt)
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
testutil.Ok(t, err)

Expand Down
Loading

0 comments on commit 995b2b5

Please sign in to comment.