From 654c2682c46c163b78c4e5c8200e82877990eb81 Mon Sep 17 00:00:00 2001 From: Ynda Jas Date: Fri, 31 May 2024 14:02:03 +0100 Subject: [PATCH] Speed and DRY up bulk republisher The bulk republisher was using `find_each` in a couple of places (based on the corresponding Rake tasks), which can reduce memory consumption versus methods like `all` (per the [find_each docs][find_each-docs]). However, since we only need to get the document IDs (in order to pass them to the republishing worker), using `pluck` to get an array of IDs in the initial query appears to be more efficient than iterating on an Active Record collection even with `find_each` I wrote a simple benchmark to work out which iteration method was quicker. The benchmark simply gets the document ID using each method, since the logic concerning what we do with the ID will be unchanged. I ran this test in the integration environment, which had 435358 documents at the time of the test ```rb Benchmark.bmbm do |x| x.report("all") { Document.all.each { |id| id } } x.report("find_each") { Document.find_each { |id| id } } x.report("pluck") { Document.pluck(:id).each { |id| id } } end ``` The results: ``` Rehearsal --------------------------------------------- all 3.885446 0.275336 4.160782 ( 4.459146) find_each 2.994770 0.169453 3.164223 ( 3.881194) pluck 0.264720 0.000000 0.264720 ( 0.353677) ------------------------------------ total: 7.589725sec user system total real all 2.837523 0.183815 3.021338 ( 3.463605) find_each 2.766133 0.129885 2.896018 ( 3.585942) pluck 0.186966 0.020088 0.207054 ( 0.288226) ``` This will speed up the queueing of bulk republishing tasks via the UI, in turn reducing the time between clicking "Confirm republishing" and seeing the confirmation on the next page Since we're now always iterating over an array of document IDs, the bulk republishing methods can be DRYed up using a private method [find_each-docs]: https://apidock.com/rails/ActiveRecord/Batches/find_each --- app/services/bulk_republisher.rb | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/app/services/bulk_republisher.rb b/app/services/bulk_republisher.rb index c8204db4b03..d4132f54bbd 100644 --- a/app/services/bulk_republisher.rb +++ b/app/services/bulk_republisher.rb @@ -1,28 +1,17 @@ class BulkRepublisher def republish_all_published_organisation_about_us_pages document_ids = Organisation.all.map(&:about_us).compact.pluck(:document_id) - - document_ids.each do |document_id| - PublishingApiDocumentRepublishingWorker.perform_async_in_queue( - "bulk_republishing", - document_id, - true, - ) - end + republish_by_document_ids(document_ids) end def republish_all_documents - Document.find_each do |document| - PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", document.id, true) - end + document_ids = Document.pluck(:id) + republish_by_document_ids(document_ids) end def republish_all_documents_with_pre_publication_editions - editions = Edition.in_pre_publication_state.includes(:document) - - editions.find_each do |edition| - PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", edition.document.id, true) - end + document_ids = Edition.in_pre_publication_state.pluck(:document_id) + republish_by_document_ids(document_ids) end def republish_all_documents_with_pre_publication_editions_with_html_attachments @@ -31,6 +20,12 @@ def republish_all_documents_with_pre_publication_editions_with_html_attachments .where(id: HtmlAttachment.where(attachable_type: "Edition").select(:attachable_id)) .pluck(:document_id) + republish_by_document_ids(document_ids) + end + +private + + def republish_by_document_ids(document_ids) document_ids.each do |document_id| PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", document_id, true) end