-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support range as an option for async reindex #1549
Comments
Or maybe use |
@ankane any thoughts? thanks. |
Hey @kapso, I was thinking it'd be good to handle gaps in primary keys at the expense of longer enqueuing times, but may be good to support the previous pattern as well. How long did it previously take to enqueue, and how does the total reindexing time compare before and after? |
@ankane We have The previous pattern used to take ~20m. We use Heroku worker to enqueue these ids, and sometimes Heroku workers get re-cycled and so does the enqueue process. |
Since so many datasets have few if any gaps in their primary keys I think the option should exist to enqueue faster rather than more accurately. Could this be a simple config option in an initializer config.queue_method = :fast # or :precise |
@ankane another interesting thing we have seen a few times is that the Enqueuing job (Heroku worker) crashes after a couple of hours with no error reported in our error system (Sentry). And since it crashed (and re-started) it obviously re-started the enqueuing process - creating a fresh index and writing to this new index. The Enqueuing (Heroku) worker had 2.5GB memory, so memory definitely wasn't an issue. This is the enqueuing job/process - |
@ankane curious if there are any plans to also support the range option? |
if relation.respond_to?(:find_in_batches)
relation.find_in_batches(batch_size: batch_size) do |items|
batch_job(class_name, batch_id, items.map(&:id)) I believe, and correct me if I'm wrong, but irb(main):010:0> Product.find_in_batches do |batch|
irb(main):011:1* puts batch.is_a?(ActiveRecord::Relation)
irb(main):012:1> end
false irb(main):007:0> Product.in_batches.each do |batch|
irb(main):008:1* puts batch.is_a?(ActiveRecord::Relation)
irb(main):009:1> end
true |
Ran into the same problem where 16 million records took nearly 2 hours just to get enqueued. I can work something out if you're willing to support ranges again @ankane. |
I am curious why this change was made? -
Changed async reindex to fetch ids instead of using ranges for numeric primary keys with Active Record
...and if it's possible (or perhaps makes sense) to support range as an option as well for async reindex
Why? - enqueuing ids is really slow, especially when writing 100s of ids/job to Sidekiq/Redis. We use a
batch_size
of 1000 and creating jobs is really slow when reindexing 40m documents.UPDATE: it took ~16 hours to enqueue ids for 40m documents with a batch size of 1000.
The text was updated successfully, but these errors were encountered: