Speed and DRY up bulk republisher

The bulk republisher was using `find_each` in a couple of places (based on the corresponding Rake tasks), which can reduce memory consumption versus methods like `all` (per the [find_each docs][find_each-docs]). However, since we only need to get the document IDs (in order to pass them to the republishing worker), using `pluck` to get an array of IDs in the initial query appears to be more efficient than iterating on an Active Record collection even with `find_each` I wrote a simple benchmark to work out which iteration method was quicker. The benchmark simply gets the document ID using each method, since the logic concerning what we do with the ID will be unchanged. I ran this test in the integration environment, which had 435358 documents at the time of the test ```rb Benchmark.bmbm do |x| x.report("all") { Document.all.each { |id| id } } x.report("find_each") { Document.find_each { |id| id } } x.report("pluck") { Document.pluck(:id).each { |id| id } } end ``` The results: ``` Rehearsal --------------------------------------------- all 3.885446 0.275336 4.160782 ( 4.459146) find_each 2.994770 0.169453 3.164223 ( 3.881194) pluck 0.264720 0.000000 0.264720 ( 0.353677) ------------------------------------ total: 7.589725sec user system total real all 2.837523 0.183815 3.021338 ( 3.463605) find_each 2.766133 0.129885 2.896018 ( 3.585942) pluck 0.186966 0.020088 0.207054 ( 0.288226) ``` This will speed up the queueing of bulk republishing tasks via the UI, in turn reducing the time between clicking "Confirm republishing" and seeing the confirmation on the next page Since we're now always iterating over an array of document IDs, the bulk republishing methods can be DRYed up using a private method [find_each-docs]: https://apidock.com/rails/ActiveRecord/Batches/find_each
alphagov · May 31, 2024 · 654c268 · 654c268
1 parent 6432c5d
commit 654c268
Showing 1 changed file with 11 additions and 16 deletions.
diff --git a/app/services/bulk_republisher.rb b/app/services/bulk_republisher.rb
@@ -1,28 +1,17 @@
 class BulkRepublisher
   def republish_all_published_organisation_about_us_pages
     document_ids = Organisation.all.map(&:about_us).compact.pluck(:document_id)
-
-    document_ids.each do |document_id|
-      PublishingApiDocumentRepublishingWorker.perform_async_in_queue(
-        "bulk_republishing",
-        document_id,
-        true,
-      )
-    end
+    republish_by_document_ids(document_ids)
   end
 
   def republish_all_documents
-    Document.find_each do |document|
-      PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", document.id, true)
-    end
+    document_ids = Document.pluck(:id)
+    republish_by_document_ids(document_ids)
   end
 
   def republish_all_documents_with_pre_publication_editions
-    editions = Edition.in_pre_publication_state.includes(:document)
-
-    editions.find_each do |edition|
-      PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", edition.document.id, true)
-    end
+    document_ids = Edition.in_pre_publication_state.pluck(:document_id)
+    republish_by_document_ids(document_ids)
   end
 
   def republish_all_documents_with_pre_publication_editions_with_html_attachments
@@ -31,6 +20,12 @@ def republish_all_documents_with_pre_publication_editions_with_html_attachments
       .where(id: HtmlAttachment.where(attachable_type: "Edition").select(:attachable_id))
       .pluck(:document_id)
 
+    republish_by_document_ids(document_ids)
+  end
+
+private
+
+  def republish_by_document_ids(document_ids)
     document_ids.each do |document_id|
       PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", document_id, true)
     end