Skip to content

Commit

Permalink
Add rake task to report on a regex in editionable content
Browse files Browse the repository at this point in the history
Sometimes we want to identify which published content matches a regular
expression (e.g. if we make a change to govspeak and need to republish
affected content).

Therefore adding a rake task that will report on the content that
includes a given regex in the currently published edition.

This is being broken down into batches of 1000, as our infrastructure
does not support large queries being made on a Rails console.
  • Loading branch information
brucebolt committed Oct 23, 2023
1 parent 0bdc607 commit 3d370a9
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 0 deletions.
13 changes: 13 additions & 0 deletions lib/tasks/reporting.rake
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,17 @@ namespace :reporting do
task published_attachments_report: :environment do
Reports::PublishedAttachmentsReport.new.report
end

desc "Prints a list of content IDs that documents whose live edition contains a given regular expression"
task :find_docs, [:regex] => :environment do |_, args|
regex = Regexp.new(/#{args[:regex]}/)

Document.where.not(live_edition_id: nil).find_in_batches(batch_size: 1000) do |batch|
batch.each do |document|
next unless document.editions.published.any?

puts document.content_id if regex.match?(document.editions.published.last.body)
end
end
end
end
26 changes: 26 additions & 0 deletions test/unit/lib/tasks/reporting_test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
require "test_helper"
require "rake"

class ReportingRake < ActiveSupport::TestCase
setup do
@document_1 = create(:published_edition, body: "Some text 1")
@document_2 = create(:draft_edition, body: "Some text 2")
@document_3 = create(:published_edition, body: "Some other text 1")
end

teardown do
Rake::Task["reporting:matching_docs"].reenable
end

test "it prints the content IDs of the matching documents from published editions" do
assert_output(/#{@document_1.document.content_id}/) { Rake.application.invoke_task "reporting:matching_docs[Some text]" }
end

test "it does not print the content IDs of the matching documents from draft editions" do
assert_output(/^(?!.*#{@document_2.document.content_id}).*$/) { Rake.application.invoke_task "reporting:matching_docs[Some text]" }
end

test "it does not print the content IDs of the non-matching documents from published editions" do
assert_output(/^(?!.*#{@document_3.document.content_id}).*$/) { Rake.application.invoke_task "reporting:matching_docs[Some text]" }
end
end

0 comments on commit 3d370a9

Please sign in to comment.