Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memberlist: support for debouncing notifications #592

Merged
merged 32 commits into from
Oct 11, 2024

Conversation

seizethedave
Copy link
Contributor

@seizethedave seizethedave commented Oct 4, 2024

What this PR does:

  • Adds a layer of buffering to Memberlist's notification handling so that notifications are fired at most once per NotifyInterval, at which point it will deliver notifications using the most recently-observed data.
  • Adds a new config flag to control this interval: -memberlist.notify-interval which defaults to 0 (off).

Motivation for this change:

In clusters where the memberlist KVStore watched by Ring has many replicas, redeploying those replicas can cause WatchKey and updateRingState to be called hundreds of times per second. When there are many concurrent goroutines calling ring.ShuffleShard, the high rate of updateRingState calls (which take locks and clear caches) can create heavy lock contention and latency as ShuffleShard attempts to take locks in order to repopulate those caches.

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@seizethedave seizethedave changed the title Ring: debounce cache invalidations triggered by KVStore updates Ring: debounce KVStore updates Oct 4, 2024
@seizethedave seizethedave requested a review from pstibrany October 4, 2024 00:24
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, PR looks fine to me. Tests that do not initialize UpdateInterval are now failing.

I think the change is safe to enable it by default, without the need to go back to previous behaviour.

Alternative to this would be implementing rate-limiting in memberlist KV client's WatchKey/WatchPrefix implementations, which would automatically cover all rings, but only benefit memberlist users. (However we would like to get rid of supporting other KV stores eventually, so that's not a big deal)

CHANGELOG.md Outdated Show resolved Hide resolved
ring/ring.go Outdated Show resolved Hide resolved
@seizethedave
Copy link
Contributor Author

I vendored this change locally and it actually does break tests in Mimir, as ruler tests attempt to wait 100ms for ring changes to propagate. I may just bite the bullet and make this change a no-op by default (synchronous propagation) and make the added delay configurable. @pstibrany let me know if you have thoughts.

@seizethedave
Copy link
Contributor Author

/find-flaky-tests

@seizethedave seizethedave requested a review from pstibrany October 8, 2024 22:54
@pstibrany
Copy link
Member

I vendored this change locally and it actually does break tests in Mimir, as ruler tests attempt to wait 100ms for ring changes to propagate. I may just bite the bullet and make this change a no-op by default (synchronous propagation) and make the added delay configurable. @pstibrany let me know if you have thoughts.

I'm fine with having it disabled by default.

One previously mentioned alternative would be to move this to memberlist KV store -- not only would it apply to all rings, but clients of memberlist KV store already expect updates to be delayed. WDYT?

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I'm fine with keeping this disabled by default. Once we

ring/ring.go Outdated Show resolved Hide resolved
@seizethedave seizethedave changed the title Ring: debounce KVStore updates Memberlist: support for debouncing notifications Oct 11, 2024
@seizethedave
Copy link
Contributor Author

I vendored this change locally and it actually does break tests in Mimir, as ruler tests attempt to wait 100ms for ring changes to propagate. I may just bite the bullet and make this change a no-op by default (synchronous propagation) and make the added delay configurable. @pstibrany let me know if you have thoughts.

I'm fine with having it disabled by default.

One previously mentioned alternative would be to move this to memberlist KV store -- not only would it apply to all rings, but clients of memberlist KV store already expect updates to be delayed. WDYT?

I had taken a cursory look at this earlier and it looked like I've have to pepper a bunch of ad-hoc timer code into the bodies of WatchKey/Prefix. But today I gave it another look and realized it could cleanly go in the memberlist notification handling before the notifs make it to the Watch methods. I like this better! PTAL in your limited time. :)

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thank you!

@seizethedave seizethedave merged commit 8e7752e into main Oct 11, 2024
5 checks passed
@seizethedave seizethedave deleted the davidgrant/ring-update-debounce branch October 11, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants