Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The bulk republisher was using `find_each` in a couple of places (based on the corresponding Rake tasks), which can reduce memory consumption versus methods like `all` (per the [find_each docs][find_each-docs]). However, since we only need to get the document IDs (in order to pass them to the republishing worker), using `pluck` to get an array of IDs in the initial query appears to be more efficient than iterating on an Active Record collection even with `find_each` I wrote a simple benchmark to work out which iteration method was quicker. The benchmark simply gets the document ID using each method, since the logic concerning what we do with the ID will be unchanged. I ran this test in the integration environment, which had 435358 documents at the time of the test ```rb Benchmark.bmbm do |x| x.report("all") { Document.all.each { |id| id } } x.report("find_each") { Document.find_each { |id| id } } x.report("pluck") { Document.pluck(:id).each { |id| id } } end ``` The results: ``` Rehearsal --------------------------------------------- all 3.885446 0.275336 4.160782 ( 4.459146) find_each 2.994770 0.169453 3.164223 ( 3.881194) pluck 0.264720 0.000000 0.264720 ( 0.353677) ------------------------------------ total: 7.589725sec user system total real all 2.837523 0.183815 3.021338 ( 3.463605) find_each 2.766133 0.129885 2.896018 ( 3.585942) pluck 0.186966 0.020088 0.207054 ( 0.288226) ``` This will speed up the queueing of bulk republishing tasks via the UI, in turn reducing the time between clicking "Confirm republishing" and seeing the confirmation on the next page Since we're now always iterating over an array of document IDs, the bulk republishing methods can be DRYed up using a private method [find_each-docs]: https://apidock.com/rails/ActiveRecord/Batches/find_each
- Loading branch information