Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rake task to update missing replacement links #9701

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

lauraghiorghisor-tw
Copy link
Contributor

@lauraghiorghisor-tw lauraghiorghisor-tw commented Dec 4, 2024

Context

Multiple Zendesk tickets related to whitehall assets being live when they should not be, triggered an analysis of the problematic data - PR. We've investigated the assets further, extracting a batch of assets that should not be live, because they should be replaced.

A subset of these assets correspond to a now identified bug where if the user replaces an attachment on a new draft, without having saved the edition draft first, the supposedly replaced original asset remains live, because it fails to redirect to its replacement. The data extraction rake task looped through all AttachmentData's where there deleted? method is false, that had a replacement set in whitehall, but no corresponding replacement value set on their assets in asset-manager. The resulting subset of assets that have this issue can be found here.

This PR provides a rake task to address the assets that are likely a result of this bug, by populating their replacement asset in asset-manager. It takes whitehall as a source of truth, relying on the AttachmentData and Asset objects to deduce the value of the replacement_id field.

This is only a data patch solution. A fix for the issue will be implemented separately.

The rake task

This task expects a csv file containing, on each row, the AttachmentData ID, the Asset ID (from asset-manager), and the asset's variant (i.e., "original" or "thumbnail").

We originally tried to run the the AssetManager::AttachmentUpdater.replace(replaced_attachment_data) line on a selection of AttachmentData IDs, but that only linked up the corresponding assets to their immediate successors, which were likely in draft. There is some logic in asset-manager that ensures all assets in the chain point to the last one, after a publish event. The data we're working with is now superseded and it's difficult to put it back to normal using the common flows of the code, so we replicated that logic for the rake task:

  • If the AttachmentData does not have a replacement, we skip. The reported asset IDs we're running this on were extracted based on whether a replaced_by_id field was set. Nonetheless, some of those AttachmentDatas are deleted.
  • We identify the last AttachmentData in the chain.
  • We figure out the corresponding asset variant we need to run an update call on (based on a variant value passed through in the csv).
  • We update asset-manager, logging any errors.

We expect the end of the line replacement to either still be live, or a broken asset (for example a draft + deleted one, which corresponds to another bug we know of) that will be taken care of in the next rake task. We will be running this rake task first, to ensure we have a healthy replacement chain, then catch all other issue with the next rake task. There are a few errors in the trial run logs, including unprocessable content and parent document url validation, but we expect any problematic data to be caught by the next rake task; the few not found errors are assets that have already been deleted (perhaps manual intervention), which means they are not a problem anymore.

Links

Trello card
Assets we fixed
Data extraction/reporting rake task
Assets investigation document

Co-authored-by @minhngocd

@lauraghiorghisor-tw lauraghiorghisor-tw force-pushed the update-assets-with-missing-replacements branch 4 times, most recently from 0158e5c to 0e38a08 Compare December 5, 2024 16:38
@lauraghiorghisor-tw lauraghiorghisor-tw force-pushed the update-assets-with-missing-replacements branch from 0e38a08 to 889594d Compare December 6, 2024 09:44
Copy link
Contributor

@minhngocd minhngocd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few non-blocking comments

@lauraghiorghisor-tw lauraghiorghisor-tw force-pushed the update-assets-with-missing-replacements branch from 889594d to e2dfc17 Compare December 7, 2024 20:44
We've identified a bug where if the user replaces an attachments on a new draft,
without having saved the edition draft first, the supposedly replaced original
asset remains live, because it fails to redirect to its replacement.

This rake task figures out the last AttachmentData in the chain, and reruns
the updater on the original AttachmentData with the replacement_id set to the
last AttachmentData's asset_manager_id. It fetches the ID based on variant
value, sending the correct `replacement_id` field to `asset-manager`.

This is only a data patch solution.
A fix for the issue will be implemented separately.
@lauraghiorghisor-tw lauraghiorghisor-tw force-pushed the update-assets-with-missing-replacements branch from 826384f to b569af2 Compare December 9, 2024 13:14
@lauraghiorghisor-tw lauraghiorghisor-tw merged commit d66b1b3 into main Dec 10, 2024
19 checks passed
@lauraghiorghisor-tw lauraghiorghisor-tw deleted the update-assets-with-missing-replacements branch December 10, 2024 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants