Handle response exception from Elasticsearch client #26424

mrkm4ntr · 2023-04-26T04:18:18Z

If Elasticsearch client returns non success HTTP status, it throws ResponseException like below.
https://github.com/elastic/elasticsearch/blob/f8ea89edcb9e79a62f79495b65eeb7c88c252f4f/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L340-L355

So currently ElasticsearchIO doesn't retry 429 response.
#22160

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

github-actions · 2023-04-26T05:05:48Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java.
R: @chamikaramj for label io.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

mrkm4ntr · 2023-04-26T05:07:40Z

R: @egalpin

github-actions · 2023-04-26T05:08:58Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

egalpin · 2023-04-27T16:22:26Z

thanks @mrkm4ntr for the contribution! I'll take a look more closely as soon as I can. In the meantime, based on the spotless checker output[1] you need to run ./gradlew :sdks:java:io:elasticsearch:spotlessApply on your local machine and commit any changes to auto-fix java style.

Could you also double-check that there is an existing test for the failure mode that your code aims to address? I'll also verify in my review. Thanks!

[1] https://ci-beam.apache.org/job/beam_PreCommit_Spotless_Commit/26535/console

mrkm4ntr · 2023-04-28T06:07:21Z

@egalpin Thanks. I checked the current tests and found that Elasticsearch returns HTTP 200 status for invalid bulk data (400 is only in response entity). While too many requests error is HTTP 429. I changed test to emulate non-200 status with 405 status.

egalpin · 2023-06-09T22:38:34Z

...ava/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

@@ -373,6 +376,8 @@ abstract static class Builder {

      abstract Builder setTrustSelfSignedCerts(boolean trustSelfSignedCerts);

+      abstract Builder setInvalidBulkEndpoint(boolean invalidBulkEndpoint);


I don't particularly like the idea of having this builder method only for the sake of testing. I feel it will be very confusing to users to see this as an option when building configurations. Could you look for an alternative approach?

I think this method and this are not visible from users. Same approach as this.

github-actions · 2023-08-09T12:42:17Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

egalpin · 2023-08-09T16:41:02Z

...ava/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

@@ -1224,26 +1241,9 @@ static class DefaultRetryPredicate implements RetryPredicate {
        this(429);
      }

-      /** Returns true if the response has the error code for any mutation. */
-      private static boolean errorCodePresent(HttpEntity responseEntity, int errorCode) {


Thanks @mrkm4ntr for your patience, I have kept you waiting too many times.

This change is definitely interesting. I can see that in some cases, such as successfully creating a document via _bulk, the http status code is 200 while the status field under items in the response body is 201 (updating the doc results in status field being 200:

{ "took": 383, "errors": false, "items": [ { "update": { "_index": "foo", "_type": "_doc", "_id": "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1, "status": 201 } } ] }

It's also the case, more importantly, that when one of the operations of the bulk payload fails, but another succeeds, there can be an http status code of 200 but one of the items can have status field with non-200 code:

{ "took": 19, "errors": true, "items": [ { "update": { "_index": "foo", "_type": "_doc", "_id": "2", "_version": 7, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 7, "_primary_term": 1, "status": 200 } }, { "update": { "_index": "foo", "_type": "_doc", "_id": "2", "status": 400, "error": { "type": "illegal_argument_exception", "reason": "failed to execute script", "caused_by": { "type": "script_exception", "reason": "compile error", "script_stack": [ "if(ctx._source.group !=== null) { ctx._source.gro ...", " ^---- HERE" ], "script": "if(ctx._source.group !=== null) { ctx._source.group = params.id % 2 } else { ctx._source.group = 0 }", "lang": "painless", "position": { "offset": 24, "start": 0, "end": 49 }, "caused_by": { "type": "illegal_argument_exception", "reason": "invalid sequence of tokens near ['='].", "caused_by": { "type": "no_viable_alt_exception", "reason": "no_viable_alt_exception: null" } } } } } } ] }

So the status code alone cannot be "trusted" to inform of all failures, since there could be false negatives. However, it seems that the http status code can be used as a first check: if the http status code is >=400, we know a failure has occurred and we don't need to iterate through the items necessarily.

Checking the http status code first might be a viable path forward for properly handling the 429 case.

Thoughts?

Thanks.
So, you believe there's a scenario where the HTTP status code is 200, but the status in the body is 429. I'm not certain this is accurate, but we should pay attention to this if we're not confident.

OK, I will change my PR like this.

Checking the http status code first

So, you believe there's a scenario where the HTTP status code is 200, but the status in the body is 429.

It's more so that I feel that we cannot remove the errorCodePresent method and rely on http status code alone. I have seen first hand (examples above) scenarios where 1 out of n bulk entities succeeds and therefore the http status code is 200, even if n-1 bulk entities failed and had their associated status field in items set to something like 400.

I believe the failure mode for not properly handling http 429 is that the response body does not contain the key items which errorCodePresent currently depends on to look further for status. Instead, in the case of 429, status is a top-level key in the response body.

So I feel that we need to keep all existing errorCodePresent logic, and add additional handling to check the http status code.

github-actions · 2023-10-10T12:41:23Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

mrkm4ntr · 2023-10-13T01:09:15Z

/hold

kennknowles · 2023-10-19T14:48:15Z

R: @egalpin

damccorm · 2023-12-12T13:56:19Z

@egalpin @mrkm4ntr what are next steps for this PR?

github-actions · 2024-02-11T12:38:10Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2024-02-18T12:38:13Z

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions bot added elasticsearch io java labels Apr 26, 2023

github-actions bot added the Next Action: Reviewers label Apr 26, 2023

mrkm4ntr force-pushed the elasticsearch/handle-response-exception branch 2 times, most recently from 1a4fb5d to 98dff77 Compare April 28, 2023 06:01

Handle response exception from Elasticsearch client

6b0a905

mrkm4ntr force-pushed the elasticsearch/handle-response-exception branch from 98dff77 to 6b0a905 Compare April 28, 2023 07:36

egalpin reviewed Jun 9, 2023

View reviewed changes

github-actions bot added the stale label Aug 9, 2023

egalpin reviewed Aug 9, 2023

View reviewed changes

github-actions bot removed the stale label Aug 10, 2023

github-actions bot added the stale label Oct 10, 2023

github-actions bot removed the stale label Oct 13, 2023

github-actions bot added the stale label Feb 11, 2024

github-actions bot closed this Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle response exception from Elasticsearch client #26424

Handle response exception from Elasticsearch client #26424

mrkm4ntr commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

mrkm4ntr commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

egalpin commented Apr 27, 2023

mrkm4ntr commented Apr 28, 2023

egalpin Jun 9, 2023

mrkm4ntr Jun 10, 2023

github-actions bot commented Aug 9, 2023

egalpin Aug 9, 2023

mrkm4ntr Aug 10, 2023

egalpin Aug 10, 2023

github-actions bot commented Oct 10, 2023

mrkm4ntr commented Oct 13, 2023

kennknowles commented Oct 19, 2023

damccorm commented Dec 12, 2023

github-actions bot commented Feb 11, 2024

github-actions bot commented Feb 18, 2024

		@@ -373,6 +376,8 @@ abstract static class Builder {

		abstract Builder setTrustSelfSignedCerts(boolean trustSelfSignedCerts);

		abstract Builder setInvalidBulkEndpoint(boolean invalidBulkEndpoint);

Handle response exception from Elasticsearch client #26424

Handle response exception from Elasticsearch client #26424

Conversation

mrkm4ntr commented Apr 26, 2023

GitHub Actions Tests Status (on master branch)

github-actions bot commented Apr 26, 2023

mrkm4ntr commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

egalpin commented Apr 27, 2023

mrkm4ntr commented Apr 28, 2023

egalpin Jun 9, 2023

Choose a reason for hiding this comment

mrkm4ntr Jun 10, 2023

Choose a reason for hiding this comment

github-actions bot commented Aug 9, 2023

egalpin Aug 9, 2023

Choose a reason for hiding this comment

mrkm4ntr Aug 10, 2023

Choose a reason for hiding this comment

egalpin Aug 10, 2023

Choose a reason for hiding this comment

github-actions bot commented Oct 10, 2023

mrkm4ntr commented Oct 13, 2023

kennknowles commented Oct 19, 2023

damccorm commented Dec 12, 2023

github-actions bot commented Feb 11, 2024

github-actions bot commented Feb 18, 2024