Skip to content

Commit

Permalink
receive: use async remote writing
Browse files Browse the repository at this point in the history
Instead of spawning new goroutines for each peer that we want to remote write to, spawn a fixed number of worker goroutines and then schedule work on top of them.

This has reduced the number of goroutines in our case about 10x-20x and the 99p of forwarding dropped from ~30s to just a few hundred milliseconds.

Signed-off-by: Giedrius Statkevičius <[email protected]>
  • Loading branch information
GiedriusS committed Jan 24, 2024
1 parent 4a73fc3 commit 266601f
Show file tree
Hide file tree
Showing 4 changed files with 228 additions and 76 deletions.
3 changes: 3 additions & 0 deletions cmd/thanos/receive.go
Original file line number Diff line number Diff line change
Expand Up @@ -831,6 +831,8 @@ type receiveConfig struct {
writeLimitsConfig *extflag.PathOrContent
storeRateLimits store.SeriesSelectLimits
limitsConfigReloadTimer time.Duration

asyncForwardWorkerCount uint
}

func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
Expand Down Expand Up @@ -888,6 +890,7 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {

cmd.Flag("receive.replica-header", "HTTP header specifying the replica number of a write request.").Default(receive.DefaultReplicaHeader).StringVar(&rc.replicaHeader)

cmd.Flag("receive.forward.async-workers", "Number of concurrent workers processing forwarding of remote-write requests.").Default("5").UintVar(&rc.asyncForwardWorkerCount)
compressionOptions := strings.Join([]string{snappy.Name, compressionNone}, ", ")
cmd.Flag("receive.grpc-compression", "Compression algorithm to use for gRPC requests to other receivers. Must be one of: "+compressionOptions).Default(snappy.Name).EnumVar(&rc.compression, snappy.Name, compressionNone)

Expand Down
11 changes: 11 additions & 0 deletions docs/components/receive.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,14 @@ NOTE:
- Thanos Receive performs best-effort limiting. In case meta-monitoring is down/unreachable, Thanos Receive will not impose limits and only log errors for meta-monitoring being unreachable. Similarly to when one receiver cannot be scraped.
- Support for different limit configuration for different tenants is planned for the future.

## Asynchronous workers

Instead of spawning a new goroutine each time the Receiver forwards a request to another node, it spawns a fixed number of goroutines (workers) that perform the work. This allows avoiding spawning potentially tens or even hundred thousand goroutines if someone starts sending a lot of small requests.

This number of workers is controlled by `--receive.forward.async-workers=`.

Please see the metric `thanos_receive_forward_delay_seconds` to see if you need to increase the number of forwarding workers.

## Flags

```$ mdox-exec="thanos receive --help"
Expand Down Expand Up @@ -308,6 +316,9 @@ Flags:
--receive.default-tenant-id="default-tenant"
Default tenant ID to use when none is provided
via a header.
--receive.forward.async-workers=5
Number of concurrent workers processing
forwarding of remote-write requests.
--receive.grpc-compression=snappy
Compression algorithm to use for gRPC requests
to other receivers. Must be one of: snappy,
Expand Down
Loading

0 comments on commit 266601f

Please sign in to comment.