-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-17696 New consumer background operations unaware of metadata errors #17440
base: trunk
Are you sure you want to change the base?
Conversation
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @m1a2st!
This looks incomplete as is 🤔
I'd thought of one suggestion, but I'm not sure if it would work or if I even like it— Add a new instance variable to store authorization exceptions (e.g. UnauthorizedTopicException
) and then update processBackgroundEvents()
’ catch
block to check for authorization errors and store then in that variable. Then add a maybeThrowAuthorizationException()
that conditionally throws the error if it's non-null
. We'd have to clear out the exception on subscribe()
or assign()
, but it might work.
Please take a look at @lianetm's comments on KAFKA-17696 again as I think she has some suggestions worth pursuing.
Thanks!
Sorry for late to reply you,
Thanks @kirktrue suggestions, I find a good way to resolve the close problem, the kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java Line 1232 in 604564c
[2] kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java Line 1283 in 604564c
I will take more deep think in these comments |
rough to solve this problem, but in close there are another problem
# Conflicts: # core/src/test/scala/integration/kafka/api/AuthorizerIntegrationTest.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @m1a2st!
- Is it sufficient to perform a single check of the background events before submitting the application event, or do we really need to perform multiple checks of the background events while we wait for the application event to complete?
- Do we need to perform this same check in more places than just the handful in this PR?
applicationEventHandler.add(listOffsetsEvent); | ||
offsetAndTimestampMap = processBackgroundEvents( | ||
listOffsetsEvent.future(), | ||
timer, __ -> false | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, we need to check for the errors from the background thread. But do we need to check repeatedly during the execution of the ListOffsetsEvent
, or can we just check once beforehand?
applicationEventHandler.add(listOffsetsEvent); | |
offsetAndTimestampMap = processBackgroundEvents( | |
listOffsetsEvent.future(), | |
timer, __ -> false | |
); | |
processBackgroundEvents(); | |
offsetAndTimestampMap = applicationEventHandler.addAndGet(listOffsetsEvent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it sufficient to perform a single check of the background events before submitting the application event, or do we really need to perform multiple checks of the background events while we wait for the application event to complete?
It can't process processBackgroundEvents()
only once before applicationEventHandler.addAndGet
. Test will be fail if I change to below, I think a loop for processBackgroundEvents is necessary
processBackgroundEvents();
offsetAndTimestampMap = applicationEventHandler.addAndGet(listOffsetsEvent);
processBackgroundEvents(unsubscribeEvent.future(), timer, | ||
e -> e instanceof InvalidTopicException || e instanceof TopicAuthorizationException || e instanceof GroupAuthorizationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, could you introduce a Predicate
variable, a la:
processBackgroundEvents(unsubscribeEvent.future(), timer, | |
e -> e instanceof InvalidTopicException || e instanceof TopicAuthorizationException || e instanceof GroupAuthorizationException); | |
final Predicate<Exception> ignoreExceptions = e -> | |
e instanceof InvalidTopicException || | |
e instanceof TopicAuthorizationException || | |
e instanceof GroupAuthorizationException; | |
processBackgroundEvents(unsubscribeEvent.future(), timer, ignoreExceptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a similar fix for AsyncKafkaConsumer#unsubscribe
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm are we sure that swallowing TopicAuth and GroupAuth on close is the right thing to do? I could surely be missing something, but I believe it's not what the classic consumer does, see my comment on it on the other PR that is also attempting this #17516 (comment)
Thoughts?
applicationEventHandler.add(checkAndUpdatePositionsEvent); | ||
cachedSubscriptionHasAllFetchPositions = processBackgroundEvents( | ||
checkAndUpdatePositionsEvent.future(), | ||
timer, __ -> false | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applicationEventHandler.add(checkAndUpdatePositionsEvent); | |
cachedSubscriptionHasAllFetchPositions = processBackgroundEvents( | |
checkAndUpdatePositionsEvent.future(), | |
timer, __ -> false | |
); | |
processBackgroundEvents(); | |
cachedSubscriptionHasAllFetchPositions = applicationEventHandler.addAndGet(checkAndUpdatePositionsEvent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't process processBackgroundEvents() only once before applicationEventHandler.addAndGet. Test will be fail if I change to below, I think a loop for processBackgroundEvents is necessary
This section also failed with we only process once.
Also, this PR overlaps a lot with PR #17516, right? |
I believe we need to have some filtering in the background event processing logic, because we don't want the checks to inadvertently execute the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processBackgroundEvents(unsubscribeEvent.future(), timer, | ||
e -> e instanceof InvalidTopicException || e instanceof TopicAuthorizationException || e instanceof GroupAuthorizationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a similar fix for AsyncKafkaConsumer#unsubscribe
as well.
Hello @FrankYang0529 feel free to reopen PR, this PR can focus |
If we want process all background event from the |
@m1a2st—I tested this fix by merging the changes in this PR with the changes from my PR that sets the default If you want to test that your change works across our topic and consumer group authorization integration tests, do the following:
|
@kirktrue , Thanks for your reminder, I will take a look at these fail tests |
# Conflicts: # core/src/test/scala/integration/kafka/api/AuthorizerIntegrationTest.scala
Hello @lianetm, @kirktrue I will focus on |
Sounds great to me! So I expect we'll end up with this PR addressing how we propagate metadata errors within the background thread to fail requests that should be aware of the error (should unblock all auth tests expecting TopicAuth error in api calls). Another PR addressing how we propagate coordinator errors within the background to fail requests similarly (unblock tests expecting GroupAuthErrors in api calls) |
The rationale behind this design is that when |
I’m thinking that some test fail for methods like |
Hey @m1a2st, sharing a thought in case it helps. First, the problem we have is that api calls like position/endOffsets trigger events that should fail with topic metadata errors but they don't, and are left hanging until they time out. So, with that in mind, it occurred to me that we do have all the events that are awaiting responses in hand when then On ConsumerNetworkThread.runOnce:
Would that work? I see that the main advantages would be to avoid the complexity of metadata future errors passed around to specific manager calls, and also it would be a solution applied consistently to all events (each event type then deciding if it should fail or not on topic metadata errors). onMetadataError, events could no-op by default, and some should override to simply do future.completeExceptionally, ex. I could be missing something but sharing in case it helps! Let me know. |
Sorry I missed this comment before. Great point, the issue is that with this PR (no matter how we implement it) we end up failing api calls/events on metadata errors, but still also keeping the previous logic that generated an ErrorEvent for them. kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/NetworkClientDelegate.java Line 157 in e73edce
We were propagating metadata errors via ErrorEvent thinking that it was only meant to be consumed from poll (which was a wrong assumption). If, with this PR, we introduce a mechanism to propagate it via the api events, I wonder if we should consider removing the redundant ErrorEvent for this case? (without ErrorEvent, poll would still fail as expected, because the CheckAndUpdatePositions would fail with the auth error) |
Hello @lianetm, Sorry for the late response.
I think this approach is great significantly simplifies the system by eliminating the need to pass CompletedFuture around, which reduces complexity. Also, based on current testing, the failing tests are still just these few. |
Hello @lianetm, Thanks for your review.
Based on this issue, the most straightforward solution I can think of at the moment is to add a new attribute in the event to determine whether the method call requires the use of the completedFuture for transmission. I have already drafted a version for this approach. WDYT? |
Jira: https://issues.apache.org/jira/browse/KAFKA-17696
When API calls that handle background events (e.g., poll, unsubscribe, close) encounter errors, the errors are only passed to the application thread via ErrorEvent.
Other API calls that do not process background events (e.g., position) are not notified of these errors, meaning that issues like unauthorized access to topics will go unnoticed by those operations.
Background operations are not aborted or notified when a metadata error occurs, such as an Unauthorized error, which can lead to situations where a call like position keeps waiting for an update, despite the Unauthorized error already happening.
Due to the blocking issue in
applicationEventHandler.addAndGet(checkAndUpdatePositionsEvent);
, I consider that we should useprocessBackgroundEvents
to get the events, that is better thanaddAndGet
.Committer Checklist (excluded from commit message)