-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure unregistered ingester joining the ring after WAL replay #6277
Make sure unregistered ingester joining the ring after WAL replay #6277
Conversation
Signed-off-by: Alex Le <[email protected]>
Can you add more details on the issue you are trying to solve? From what we discuss offline I am a bit confused. While the openTSBD is happening, |
…n starting Signed-off-by: Alex Le <[email protected]>
… start time Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change makes sense to me.
Thanks
Can you add an ENHANCEMENT on changelog? |
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Shouldn't the DefaultReplicationStrategy do its job to filter out unhealthy instances? |
The problem for the original code was that if ingesters joined the ring with pending state and loaded tokens from token file exists on disk, distributor would include those ingesters in replication sets. For example, with replication factor of 3, there would be 3 ingesters in replication set. Those 3 ingesters might contain more than 2 ingesters that had tokens and in pending state. After getting replication set, ring code would apply filter on those 3 ingesters. It would find 2 out of 3 were unhealthy because they were in pending state and returned error to fail fast since minimum success would not meet. |
Based on my understanding, if |
What this PR does:
Currently, when starting unregistered ingester with token file on disk, ingester would join the ring with pending state along with tokens read from file first. After replaying WAL, ingester would be turned into active state. In this case, ingester would receive write requests from distributor because it is in the ring with tokens. Since ingester is in pending state, all incoming write requests would be rejected with 5xx. Under such situation, end user cannot start multiple ingesters with token file at the same to avoid remote write 5xx in distributor.
This change is meant to fix this problem. So that multiple ingesters with token file can be spinned up at same time without causing remote write 5xx. The fix is that an unregistered instance would not join the ring if token file exists while
autoJoinOnStartup
is set to false untilautoJoinAfter
is triggered in loop. Also, heartbeat would not start immediately if instance did not join the ring on init. Heartbeat would start either inautoJoinAfter
orobserveChan
when instance being added into the ring.Which issue(s) this PR fixes:
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]