Make sure unregistered ingester joining the ring after WAL replay #6277

alexqyle · 2024-10-17T23:05:24Z

What this PR does:

Currently, when starting unregistered ingester with token file on disk, ingester would join the ring with pending state along with tokens read from file first. After replaying WAL, ingester would be turned into active state. In this case, ingester would receive write requests from distributor because it is in the ring with tokens. Since ingester is in pending state, all incoming write requests would be rejected with 5xx. Under such situation, end user cannot start multiple ingesters with token file at the same to avoid remote write 5xx in distributor.

This change is meant to fix this problem. So that multiple ingesters with token file can be spinned up at same time without causing remote write 5xx. The fix is that an unregistered instance would not join the ring if token file exists while autoJoinOnStartup is set to false until autoJoinAfter is triggered in loop. Also, heartbeat would not start immediately if instance did not join the ring on init. Heartbeat would start either in autoJoinAfter or observeChan when instance being added into the ring.

Which issue(s) this PR fixes:

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Alex Le <[email protected]>

…-join-ring-on-start

pkg/ingester/ingester.go

danielblando · 2024-11-26T01:11:39Z

Can you add more details on the issue you are trying to solve? From what we discuss offline I am a bit confused.

While the openTSBD is happening,
It seems the old code would mark ingester from LEAVING to PENDING on the ring and start heartbeat.
With this change we would leave the ingester on the ring as LEAVING without heartbeat.
Correct?

…n starting Signed-off-by: Alex Le <[email protected]>

… start time Signed-off-by: Alex Le <[email protected]>

Signed-off-by: Alex Le <[email protected]>

danielblando

Change makes sense to me.
Thanks

danielblando · 2024-12-05T17:02:59Z

Can you add an ENHANCEMENT on changelog?

Signed-off-by: Alex Le <[email protected]>

damnever · 2024-12-06T03:53:25Z

Shouldn't the DefaultReplicationStrategy do its job to filter out unhealthy instances?

alexqyle · 2024-12-16T18:26:49Z

Shouldn't the DefaultReplicationStrategy do its job to filter out unhealthy instances?

The problem for the original code was that if ingesters joined the ring with pending state and loaded tokens from token file exists on disk, distributor would include those ingesters in replication sets. For example, with replication factor of 3, there would be 3 ingesters in replication set. Those 3 ingesters might contain more than 2 ingesters that had tokens and in pending state. After getting replication set, ring code would apply filter on those 3 ingesters. It would find 2 out of 3 were unhealthy because they were in pending state and returned error to fail fast since minimum success would not meet.

damnever · 2024-12-17T02:40:26Z

Shouldn't the DefaultReplicationStrategy do its job to filter out unhealthy instances?

The problem for the original code was that if ingesters joined the ring with pending state and loaded tokens from token file exists on disk, distributor would include those ingesters in replication sets. For example, with replication factor of 3, there would be 3 ingesters in replication set. Those 3 ingesters might contain more than 2 ingesters that had tokens and in pending state. After getting replication set, ring code would apply filter on those 3 ingesters. It would find 2 out of 3 were unhealthy because they were in pending state and returned error to fail fast since minimum success would not meet.

Based on my understanding, if extend-writes=true, then ShouldExtendReplicaSetOnState should extend the replicate set if an ingester is in the PENDING state. Otherwise, we should disable ingester.unregister-on-shutdown, and the PENDING state does not matter since we would expect that series not to spread to other ingesters.

Make sure ingester is active when joining the ring

04b470f

Signed-off-by: Alex Le <[email protected]>

pull-request-size bot added the size/S label Oct 17, 2024

alexqyle and others added 3 commits October 29, 2024 16:26

Merge branch 'cortexproject:master' into ingester-join-ring-on-start

2a780f3

Merge branch 'master' into ingester-join-ring-on-start

eab6616

Merge commit 'e1c3d79a4f3ef2662a72d782a1290a017d14e380' into ingester…

d87cab9

…-join-ring-on-start

danielblando reviewed Nov 26, 2024

View reviewed changes

pkg/ingester/ingester.go Outdated Show resolved Hide resolved

alexqyle added 2 commits December 3, 2024 13:49

tokens from file would be ignored if instance was not in the ring whe…

d6c9cf3

…n starting Signed-off-by: Alex Le <[email protected]>

Skip CAS if instance was not in the ring on start and delay heartbeat…

fb10504

… start time Signed-off-by: Alex Le <[email protected]>

pull-request-size bot added size/M and removed size/S labels Dec 4, 2024

Fix compactor tests

913e056

Signed-off-by: Alex Le <[email protected]>

alexqyle changed the title ~~Make sure ingester is active when joining the ring~~ Make sure ingester joining the ring after WAL replay Dec 4, 2024

alexqyle changed the title ~~Make sure ingester joining the ring after WAL replay~~ Make sure unregistered ingester joining the ring after WAL replay Dec 4, 2024

alexqyle marked this pull request as ready for review December 4, 2024 18:25

dosubot bot added component/ingester component/ring labels Dec 4, 2024

Fixed existing token not loaded issue and added unit test

1b19602

Signed-off-by: Alex Le <[email protected]>

pull-request-size bot added size/L and removed size/M labels Dec 4, 2024

danielblando approved these changes Dec 5, 2024

View reviewed changes

alanprot approved these changes Dec 5, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 5, 2024

alexqyle added 2 commits December 5, 2024 11:46

update changelog

3a1145d

Signed-off-by: Alex Le <[email protected]>

Merge branch 'master' into ingester-join-ring-on-start

4278600

Signed-off-by: Alex Le <[email protected]>

danielblando merged commit bdc357c into cortexproject:master Dec 5, 2024
16 checks passed

alexqyle deleted the ingester-join-ring-on-start branch December 5, 2024 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure unregistered ingester joining the ring after WAL replay #6277

Make sure unregistered ingester joining the ring after WAL replay #6277

alexqyle commented Oct 17, 2024 •

edited

Loading

danielblando commented Nov 26, 2024

danielblando left a comment

danielblando commented Dec 5, 2024

damnever commented Dec 6, 2024

alexqyle commented Dec 16, 2024

damnever commented Dec 17, 2024 •

edited

Loading

Make sure unregistered ingester joining the ring after WAL replay #6277

Make sure unregistered ingester joining the ring after WAL replay #6277

Conversation

alexqyle commented Oct 17, 2024 • edited Loading

danielblando commented Nov 26, 2024

danielblando left a comment

Choose a reason for hiding this comment

danielblando commented Dec 5, 2024

damnever commented Dec 6, 2024

alexqyle commented Dec 16, 2024

damnever commented Dec 17, 2024 • edited Loading

alexqyle commented Oct 17, 2024 •

edited

Loading

damnever commented Dec 17, 2024 •

edited

Loading