-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1670] Avoid swallowing InterruptedException in ShuffleClientImpl #2849
base: main
Are you sure you want to change the base?
Conversation
Could you please review this? ping @waitinfuture @RexXiong @FMX @SteNicholas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pushDataRpcResponseCallback.onFailure( | ||
new CelebornIOException(StatusCode.PUSH_DATA_CREATE_CONNECTION_FAIL_PRIMARY, e)); | ||
if (e instanceof InterruptedException) { | ||
Thread.currentThread().interrupt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maintain only the thread's interrupt state while still invoking callback.onFailure? And I've noticed some inconsistencies in how InterruptedException is handled in the Celeborn client. In some cases it throws an exception, while in other cases it is simply ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that we can invoke callback.onFailure(InterruptedException e) to interrupt the current thread? That seems reasonable. I believe that ignoring InterruptedException is unsafe, especially when we catch it without any specific handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contributor Author
yes, better keep that behavior If there is no particular reason.
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
What changes were proposed in this pull request?
As title.
Why are the changes needed?
In the ShuffleClientImpl, methods such as pushData and pushMergedData might encounter interruptions during message transmission via the TransportClient. However, the InterruptedException may be ignored, as it is handled as a standard exception. As a result, the ShuffleClientImpl continues its operation(retry or revive) even when an InterruptedException occurs.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add UT: org.apache.celeborn.client.ShuffleClientSuiteJ#testPushDataAndInterrupted