Skip to content

Commit

Permalink
Speed and DRY up bulk republisher
Browse files Browse the repository at this point in the history
The bulk republisher was using `find_each` in a couple of places (based
on the corresponding Rake tasks), which can reduce memory consumption
versus methods like `all` (per the [find_each docs][find_each-docs]).
However, since we only need to get the document IDs (in order to pass
them to the republishing worker), using `pluck` to get an array of IDs
in the initial query appears to be more efficient than iterating on an
Active Record collection even with `find_each`

I wrote a simple benchmark to work out which iteration method was
quicker. The benchmark simply gets the document ID using each method,
since the logic concerning what we do with the ID will be unchanged. I
ran this test in the integration environment, which had 435358 documents
at the time of the test

```rb
Benchmark.bmbm do |x|
  x.report("all") { Document.all.each { |id| id } }
  x.report("find_each") { Document.find_each { |id| id } }
  x.report("pluck") { Document.pluck(:id).each { |id| id } }
end
```

The results:

```
Rehearsal ---------------------------------------------
all         3.885446   0.275336   4.160782 (  4.459146)
find_each   2.994770   0.169453   3.164223 (  3.881194)
pluck       0.264720   0.000000   0.264720 (  0.353677)
------------------------------------ total: 7.589725sec

                user     system      total        real
all         2.837523   0.183815   3.021338 (  3.463605)
find_each   2.766133   0.129885   2.896018 (  3.585942)
pluck       0.186966   0.020088   0.207054 (  0.288226)
```

This will speed up the queueing of bulk republishing tasks via the UI,
in turn reducing the time between clicking "Confirm republishing" and
seeing the confirmation on the next page

Since we're now always iterating over an array of document IDs, the bulk
republishing methods can be DRYed up using a private method

[find_each-docs]: https://apidock.com/rails/ActiveRecord/Batches/find_each
  • Loading branch information
yndajas committed May 31, 2024
1 parent 6432c5d commit 654c268
Showing 1 changed file with 11 additions and 16 deletions.
27 changes: 11 additions & 16 deletions app/services/bulk_republisher.rb
Original file line number Diff line number Diff line change
@@ -1,28 +1,17 @@
class BulkRepublisher
def republish_all_published_organisation_about_us_pages
document_ids = Organisation.all.map(&:about_us).compact.pluck(:document_id)

document_ids.each do |document_id|
PublishingApiDocumentRepublishingWorker.perform_async_in_queue(
"bulk_republishing",
document_id,
true,
)
end
republish_by_document_ids(document_ids)
end

def republish_all_documents
Document.find_each do |document|
PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", document.id, true)
end
document_ids = Document.pluck(:id)
republish_by_document_ids(document_ids)
end

def republish_all_documents_with_pre_publication_editions
editions = Edition.in_pre_publication_state.includes(:document)

editions.find_each do |edition|
PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", edition.document.id, true)
end
document_ids = Edition.in_pre_publication_state.pluck(:document_id)
republish_by_document_ids(document_ids)
end

def republish_all_documents_with_pre_publication_editions_with_html_attachments
Expand All @@ -31,6 +20,12 @@ def republish_all_documents_with_pre_publication_editions_with_html_attachments
.where(id: HtmlAttachment.where(attachable_type: "Edition").select(:attachable_id))
.pluck(:document_id)

republish_by_document_ids(document_ids)
end

private

def republish_by_document_ids(document_ids)
document_ids.each do |document_id|
PublishingApiDocumentRepublishingWorker.perform_async_in_queue("bulk_republishing", document_id, true)
end
Expand Down

0 comments on commit 654c268

Please sign in to comment.