-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle response exception from Elasticsearch client #26424
Handle response exception from Elasticsearch client #26424
Conversation
Assigning reviewers. If you would like to opt out of this review, comment R: @kennknowles for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
R: @egalpin |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
thanks @mrkm4ntr for the contribution! I'll take a look more closely as soon as I can. In the meantime, based on the spotless checker output[1] you need to run Could you also double-check that there is an existing test for the failure mode that your code aims to address? I'll also verify in my review. Thanks! [1] https://ci-beam.apache.org/job/beam_PreCommit_Spotless_Commit/26535/console |
1a4fb5d
to
98dff77
Compare
@egalpin Thanks. I checked the current tests and found that Elasticsearch returns HTTP 200 status for invalid bulk data (400 is only in response entity). While too many requests error is HTTP 429. I changed test to emulate non-200 status with 405 status. |
98dff77
to
6b0a905
Compare
@@ -373,6 +376,8 @@ abstract static class Builder { | |||
|
|||
abstract Builder setTrustSelfSignedCerts(boolean trustSelfSignedCerts); | |||
|
|||
abstract Builder setInvalidBulkEndpoint(boolean invalidBulkEndpoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly like the idea of having this builder method only for the sake of testing. I feel it will be very confusing to users to see this as an option when building configurations. Could you look for an alternative approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
@@ -1224,26 +1241,9 @@ static class DefaultRetryPredicate implements RetryPredicate { | |||
this(429); | |||
} | |||
|
|||
/** Returns true if the response has the error code for any mutation. */ | |||
private static boolean errorCodePresent(HttpEntity responseEntity, int errorCode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrkm4ntr for your patience, I have kept you waiting too many times.
This change is definitely interesting. I can see that in some cases, such as successfully creating a document via _bulk, the http status code is 200
while the status
field under items
in the response body is 201
(updating the doc results in status
field being 200
:
{
"took": 383,
"errors": false,
"items": [
{
"update": {
"_index": "foo",
"_type": "_doc",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
It's also the case, more importantly, that when one of the operations of the bulk payload fails, but another succeeds, there can be an http status code of 200 but one of the items can have status
field with non-200 code:
{
"took": 19,
"errors": true,
"items": [
{
"update": {
"_index": "foo",
"_type": "_doc",
"_id": "2",
"_version": 7,
"result": "updated",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 7,
"_primary_term": 1,
"status": 200
}
},
{
"update": {
"_index": "foo",
"_type": "_doc",
"_id": "2",
"status": 400,
"error": {
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"if(ctx._source.group !=== null) { ctx._source.gro ...",
" ^---- HERE"
],
"script": "if(ctx._source.group !=== null) { ctx._source.group = params.id % 2 } else { ctx._source.group = 0 }",
"lang": "painless",
"position": {
"offset": 24,
"start": 0,
"end": 49
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "invalid sequence of tokens near ['='].",
"caused_by": {
"type": "no_viable_alt_exception",
"reason": "no_viable_alt_exception: null"
}
}
}
}
}
}
]
}
So the status code alone cannot be "trusted" to inform of all failures, since there could be false negatives. However, it seems that the http status code can be used as a first check: if the http status code is >=400, we know a failure has occurred and we don't need to iterate through the items
necessarily.
Checking the http status code first might be a viable path forward for properly handling the 429 case.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
So, you believe there's a scenario where the HTTP status code is 200, but the status in the body is 429. I'm not certain this is accurate, but we should pay attention to this if we're not confident.
OK, I will change my PR like this.
Checking the http status code first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you believe there's a scenario where the HTTP status code is 200, but the status in the body is 429.
It's more so that I feel that we cannot remove the errorCodePresent
method and rely on http status code alone. I have seen first hand (examples above) scenarios where 1 out of n
bulk entities succeeds and therefore the http status code is 200, even if n-1
bulk entities failed and had their associated status
field in items
set to something like 400
.
I believe the failure mode for not properly handling http 429 is that the response body does not contain the key items
which errorCodePresent
currently depends on to look further for status
. Instead, in the case of 429, status
is a top-level key in the response body.
So I feel that we need to keep all existing errorCodePresent
logic, and add additional handling to check the http status code.
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
/hold |
R: @egalpin |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
If Elasticsearch client returns non success HTTP status, it throws ResponseException like below.
https://github.com/elastic/elasticsearch/blob/f8ea89edcb9e79a62f79495b65eeb7c88c252f4f/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L340-L355
So currently ElasticsearchIO doesn't retry 429 response.
#22160
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.