-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise error in #scroll_batches when search backend returns a failure #916
base: master
Are you sure you want to change the base?
Raise error in #scroll_batches when search backend returns a failure #916
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, small changes are required to format of the error
lib/chewy/search/scrolling.rb
Outdated
@@ -34,6 +34,9 @@ def scroll_batches(batch_size: Request::DEFAULT_BATCH_SIZE, scroll: Request::DEF | |||
scroll_id = nil | |||
|
|||
loop do | |||
failures = result.dig('_shards', 'failures') | |||
raise Chewy::Error, failures if failures.present? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Chewy::Error
is not the best thing here, from your example, failures
is a hash, and Chewy::Error
is simply a StandardError with a string argument. Please make a specific error message and convert failure
into a meaningful string? For instance
Lines 20 to 32 in 10ff43b
class ImportFailed < Error | |
def initialize(type, import_errors) | |
message = "Import failed for `#{type}` with:\n" | |
import_errors.each do |action, action_errors| | |
message << " #{action.to_s.humanize} errors:\n" | |
action_errors.each do |error, documents| | |
message << " `#{error}`\n" | |
message << " on #{documents.count} documents: #{documents}\n" | |
end | |
end | |
super message | |
end | |
end |
README.md
Outdated
## Running specs | ||
|
||
Make sure you're running a local Elasticsearch instance. | ||
|
||
``` | ||
ES_PORT=9200 bundle exec rspec | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, could we remove this? I planning to backport docker-compose setup with ES, so it runs on proper port https://github.com/toptal/chewy/pull/917/files#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3R1-R16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, done!
@@ -33,6 +33,56 @@ | |||
let(:countries) { Array.new(3) { |i| Country.create!(rating: i + 2, name: "country #{i}") } } | |||
|
|||
describe '#scroll_batches' do | |||
describe 'with search backend returning failures' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, do you think it will be possible to provide integration spec? Where you really call ES and get real answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion how I can set up an integration spec that returns this error state? Elasticsearch should return a total count higher than zero but then return zero hits in order to hit this code path.
Hey @tomdev
Could you please share your chewy version, elasticsearch server version and elasticsearch gem version, that could be helpful? The issue could be also solved if you could set the scroll pointer expiration (which defaults to 1 minute) for |
Hey Barthez, thanks for reproducing the issue. Interesting to see the response is actually a We're unable to continue using Elasticsearch as we're using the AWS OpenSearch service. We're running:
We haven't identified any other issues (thus far) using chewy against OpenSearch. An attempt to increase the scroll pointer expiration did not succeed; we've set it to Do you think OpenSearch returning a |
Thanks @tomdev Chewy does not aim to support OpenSearch. This sounds like an altered behavior of OpenSearch vs Elasticsearch. I would suggest adding a custom exception, something like |
I implemented the This is slightly altering the behaviour of how this data is retrieved and even though the test suite succeeds I wanted to double check if anyone knows why the previous approach was chosen (loop until fetched >= total hits). Could I be missing an edge case here that was covered by the previous implementation? The specs don't indicate that. |
We've been running this PR in production for ~2 weeks and have seen the failure (where we'd previously got stuck in an infinite loop) now successfully raising the |
Thank you @tomdev. Sorry, I lost track of this PR. Cold you please rebase and fix the conflict? I will try to merge & release it as soon as possible. |
@tomdev is this PR still relevant? Sorry for the delay, I've been extremely busy. |
No worries, same here! Yes; this is still relevant to us, and we've been running from this PR in production for months. It's working well for us. I think this PR needs some rebasing and fixing conflicts, I can take a look at that. |
@tomdev Sure thank you! Keep in mind, we have moved to ES 8.x, so some extra adjustments might be needed. |
We are running into a bug in production when performing
chewy:sync
.Our search backend is intermittently returning a
200
response without hits and containing a backend failure, see example response:scroll_batches currently is not taking these failures into account. Because there are no hits returned, the logic of
fetched >= total
will never be reached, causing the loop to never break.Because of this we've experienced
chewy:sync
running for days instead of an hour. (Yes, we now have proper monitoring in place...)This PR will raise a
Chewy::Error
when the search backend is returning failures.Before submitting the PR make sure the following are checked:
[Fix #issue-number]
(if the related issue exists).master
(if not - rebase it).