-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Avoid being stuck when closing the broker with extensible load manager #22573
Merged
lhotari
merged 2 commits into
apache:master
from
BewareMyPower:bewaremypower/extensible-lb-close-gracefully
Apr 26, 2024
Merged
[fix][broker] Avoid being stuck when closing the broker with extensible load manager #22573
lhotari
merged 2 commits into
apache:master
from
BewareMyPower:bewaremypower/extensible-lb-close-gracefully
Apr 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BewareMyPower
added
type/bug
The PR fixed a bug or issue reported a bug
area/broker
labels
Apr 24, 2024
BewareMyPower
requested review from
lhotari,
Technoboy-,
codelipenghui,
Demogorgon314,
poorbarcode and
heesung-sn
April 24, 2024 12:23
BewareMyPower
changed the title
[fix][broker] Avoid being stuck in 30+ seconds when closing the BrokerService
[fix][broker] Avoid being stuck too long when closing the broker with extensible load manager
Apr 24, 2024
…rService Fixes apache#22569 ### Motivation `BrokerService#closeAsync` calls `unloadNamespaceBundlesGracefully` to unload namespaces gracefully. With extensible load manager, it eventually calls `TableViewLoadDataStoreImpl#validateProducer`: ``` BrokerService#unloadNamespaceBundlesGracefully ExtensibleLoadManagerWrapper#disableBroker ExtensibleLoadManagerImpl#disableBroker ServiceUnitStateChannelImpl#cleanOwnerships ServiceUnitStateChannelImpl#doCleanup TableViewLoadDataStoreImpl#removeAsync TableViewLoadDataStoreImpl#validateProducer ``` In `validateProducer`, if the producer is not connected, it will recreate the producer synchronously. However, since the state of `PulsarService` has already been changed to `Closing`, all connect or lookup requests will fail with `ServiceNotReady`. Then the client will retry until timeout. Besides, the unload operation could also trigger the reconnection because the extensible load manager sends the unload event to the `loadbalancer-service-unit-state` topic. ### Modifications The major fix: Before changing PulsarService's state to `Closing`, call `BrokerService#unloadNamespaceBundlesGracefully` first to make the load manager complete the unload operations first. Minor fixes: - Record the time when `LoadManager#disableBroker` is done. - Don't check if producer is disconnected because the producer could retry if it's disconnected. ### Verifications Add `ExtensibleLoadManagerCloseTest` to verify closing `PulsarService` won't take too much time. Here are some test results locally: ``` 2024-04-24T19:43:38,851 - INFO - [main:ExtensibleLoadManagerCloseTest] - Brokers close time: [3342, 3276, 3310] 2024-04-24T19:44:26,711 - INFO - [main:ExtensibleLoadManagerCloseTest] - Brokers close time: [3357, 3258, 3298] 2024-04-24T19:46:16,791 - INFO - [main:ExtensibleLoadManagerCloseTest] - Brokers close time: [3313, 3257, 3263] 2024-04-24T20:13:05,763 - INFO - [main:ExtensibleLoadManagerCloseTest] - Brokers close time: [3304, 3279, 3299] 2024-04-24T20:13:43,979 - INFO - [main:ExtensibleLoadManagerCloseTest] - Brokers close time: [3343, 3308, 3310] ``` As you can see, each broker takes only about 3 seconds to close due to `OWNERSHIP_CLEAN_UP_CONVERGENCE_DELAY_IN_MILLIS` value added in apache#20315
BewareMyPower
force-pushed
the
bewaremypower/extensible-lb-close-gracefully
branch
from
April 24, 2024 12:26
cbf5ac0
to
a29d3b9
Compare
BewareMyPower
changed the title
[fix][broker] Avoid being stuck too long when closing the broker with extensible load manager
[fix][broker] Avoid being stuck when closing the broker with extensible load manager
Apr 24, 2024
OWNERSHIP_CLEAN_UP_CONVERGENCE_DELAY_IN_MILLIS I think we can actually remove this. This was added to wait for some time after bundles are unloaded, but I don't think it is necessary. |
Agreed. We can remove it in another PR. |
heesung-sn
approved these changes
Apr 25, 2024
lhotari
reviewed
Apr 26, 2024
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Show resolved
Hide resolved
lhotari
approved these changes
Apr 26, 2024
RobertIndie
approved these changes
Apr 26, 2024
poorbarcode
approved these changes
Apr 26, 2024
nikhil-ctds
pushed a commit
to datastax/pulsar
that referenced
this pull request
May 15, 2024
…le load manager (apache#22573) (cherry picked from commit f411e3c) (cherry picked from commit db43414)
srinath-ctds
pushed a commit
to datastax/pulsar
that referenced
this pull request
May 16, 2024
…le load manager (apache#22573) (cherry picked from commit f411e3c) (cherry picked from commit db43414)
4 tasks
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/broker
cherry-picked/branch-3.0
cherry-picked/branch-3.2
doc-not-needed
Your PR changes do not impact docs
ready-to-test
release/3.0.5
release/3.2.3
type/bug
The PR fixed a bug or issue reported a bug
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #22569
Motivation
BrokerService#closeAsync
callsunloadNamespaceBundlesGracefully
to unload namespaces gracefully. With extensible load manager, it eventually callsTableViewLoadDataStoreImpl#validateProducer
:In
validateProducer
, if the producer is not connected, it will recreate the producer synchronously. However, since the state ofPulsarService
has already been changed toClosing
, all connect or lookup requests will fail withServiceNotReady
. Then the client will retry until timeout.Besides, the unload operation could also trigger the reconnection because the extensible load manager sends the unload event to the
loadbalancer-service-unit-state
topic.Modifications
The major fix:
Before changing PulsarService's state to
Closing
, callBrokerService#unloadNamespaceBundlesGracefully
first to make the load manager complete the unload operations first.Minor fixes:
LoadManager#disableBroker
is done.Verifications
Add
ExtensibleLoadManagerCloseTest
to verify closingPulsarService
won't take too much time. Here are some test results locally:As you can see, each broker takes only about 3 seconds to close due to
OWNERSHIP_CLEAN_UP_CONVERGENCE_DELAY_IN_MILLIS
value added in #20315Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: BewareMyPower#31