-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bulk fix assets rake task #1551
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lauraghiorghisor-tw
force-pushed
the
add-rake-task-to-fix-assets
branch
2 times, most recently
from
November 29, 2024 09:09
65dbd8e
to
894eb5a
Compare
lauraghiorghisor-tw
force-pushed
the
add-rake-task-to-fix-assets
branch
12 times, most recently
from
December 4, 2024 16:49
58c4006
to
f47657e
Compare
On the back of an investigation into asset deletion states, we've produced a set of assets that are not deleted in wh, but should not be live. In order to preserve some data integrity, we're choosing to fix the data, rather than do a mass deletion. We've identified some flows where draft replacements should be deleted. Nonetheless, the original asset does not redirect to a replacement if the replacement is in draft, so we're also switching the draft state to false. The order is important, as the replacements themselves would become public if we don't action the deletion first. We're checking whether each asset that we feed through the rake task is itself or has a replacement in this state. For assets that are not draft replacements nor have draft replacements, we want to allow them to continue redirecting, being replaced etc. A final catch-all step ensures remaining assets (for which the steps above don't fire) are deleted. We've also identified broken flows that cause missing replacement links, but those will be fixed by running some updater logic from a wh rake task.
lauraghiorghisor-tw
force-pushed
the
add-rake-task-to-fix-assets
branch
3 times, most recently
from
December 5, 2024 13:41
2522db1
to
0b5759d
Compare
- Add a clause to remove parent document url when fixing draft states when deleting superseded assets Some assets are incorrectly staying in the draft state. But when we try and fix the draft state, it cannot be linking to a draft parent_document_url. We are removing the parent document url link since it is difficult to figure out what the parent_document_url should be, and is largely irrelevant for a superseded asset. - Fix correct state for assets that have gone in a weird invalid state over the years. "deleted" should no longer be a valid state. patching any data that we come across that has a wrong state. - Log when we've patched an invalid state or parent document URL
We want to know if we are skipping assets because they have already been fixed, so that we store a correct account of what's been changed.
lauraghiorghisor-tw
force-pushed
the
add-rake-task-to-fix-assets
branch
from
December 5, 2024 13:42
0b5759d
to
7171e60
Compare
minhngocd
force-pushed
the
add-rake-task-to-fix-assets
branch
from
December 5, 2024 15:47
7171e60
to
0fbadf9
Compare
We want to catch any exceptions not surfaced by the validations on the model. Also grouped the tests to be more readible.
lauraghiorghisor-tw
force-pushed
the
add-rake-task-to-fix-assets
branch
from
December 5, 2024 16:35
0fbadf9
to
392c045
Compare
…edirected These are relatively harmless to the public, as the links will redirect to SignOn and require a login to preview. However it would still be available to other users with a SignOn account. For assets that are superseded, it's probably best we also remove them so users are certain that as long as assets are not linked to their live or preview editions, it should not be accessible anywhere.
minhngocd
force-pushed
the
add-rake-task-to-fix-assets
branch
from
December 6, 2024 15:10
560213a
to
61002a2
Compare
minhngocd
approved these changes
Dec 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paired on this
…at are further replaced. The set of data plucked from integration is of only assets associated with superseded editions to the end of the line, therefore should not be a problem to remove.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Multiple Zendesk tickets related to
whitehall
assets being live when they should not be, triggered an analysis of the problematic data - PR here. We've already deleted a batch of assets thatwhitehall
expected to be deleted but they were not, using this rake task. We have now investigated asset deletion states further, and we've produced a set of assets that, despite not being deleted in Whitehall, should not be live. In order to preserve some data integrity, we're choosing to fix the data with this rake task, rather than do a mass deletion. The assets in question are serving a chain of only superseded and deleted editions (nothing live) - see extraction rake task.We now know that there are a few broken flows in
whitehall
that are continuously producing incorrect asset states, specifically unintentionally live assets. This PR defines a rake task that should be able to tackle each asset and, if it matches some of the patterns that we know how to fix, should remedy the data, otherwise just delete it.Breakdown of the rake task
One of the broken flows we've uncovered is: the user replaces an
Attachment
inwhitehall
, then they delete the replacement. The intended effect should be that the original assets of the attachment are not live anymore, nor are the replacements, since theAttachment
is now deleted. The bug is that the replacementAsset
inasset-manager
remains in draft as deleted (draft: true
anddeleted_at
not nil), when it should be going live as deleted. This causes the original asset to not redirect to the draft replacement. This is intentional, for example when we do not want a replacement redirect to be effective just yet, for example when working on a new draft.We want to fix this by checking if the asset has a draft replacement or is itself a draft replacement. We delete the draft replacement/draft asset if it is not already deleted. Nonetheless, for the redirect to work, we also need to switch the draft state to false. The order is important, as the replacements themselves would become public if we don't action the deletion first.
For assets that are not draft replacements nor have draft replacements, we want to allow them to continue redirecting, being replaced etc. Draft assets that are not replaced or redirected, should also be deleted, as they should not be available in the draft stack if they have been superseded. The current correct flows in
whitehall
suggest that the only possible states for an asset that has been superseded is either deleted or replaced, if it has not been otherwise redirected.A final catch-all step ensures remaining assets (for which the steps above don't apply) are deleted.
Data extraction
In Whitehall Publisher, we've identified what data we need to extract by extracting all the
AttachmentData
s where the deleted? method is false and relevant database fields to help guide our understanding of the data. The core theory was that assets corresponding to superseded or deleted editions, where theAttachmentData
'sdeleted?
method is false, should be further replaced or otherwise deleted, unless they have been redirected.This is the second part of the data extraction - see first part here. Splitting the data extraction based on the deleted? method evaluation was a way to reason about deletion state in
whitehall
, since there is no obvious way in which that state is represented in the database. Whilst a soft delete is applied to theAttachment
model, it is theAttachmentData
model that actually manages theAssets.
It deduces the deletion state based on theAttachment
state and theEdition
visibility. This logic is captured in thisdeleted?
method, which is evaluated whenever we try to updateasset-manager
. It must be true for a delete value to be sent, and false for the other updates to be sent through -replacement
,draft
,redirect_url
.NB: Whilst we have extracted data throughly, tackled the known bugs, and otherwise deleted all other assets we've reported on, there is a chance we may have missed some scenarios. This data is restricted to superseded and deleted editions and should have exhaustively fixed or deleted assets that pertain only to superseded and deleted editions. Anything outside of that has not been tackled here. Additionally, the current data extracted to be fixed is based on an older integration dump (end of October 2024), meaning there may be wrong
Assets
that got created between then and the time an upcoming code fix goes live. Nonetheless, this data patch should eliminate a great deal of risk from past data.Related fixes
We've also identified a broken flow that causes missing replacement links. If the user makes a new edition draft, and proceeds to make an attachment replacement without having saved the draft first, the live assets do not get set with a replacement in
asset-manager
. These assets will be fixed by running some updater logic from a whitehall rake task. That will be run prior to this rake task, so that we have healthy replacement chains that we can then further fix.Upcoming work will also tackle unpublished editions, ensuring redirects are set accordingly.
Next steps
We'll attempt to fix the logic that causes the ongoing issues in an upcoming card. The way in which we're proposing to fix the data in this rake task will translate to how we'll remedy the logic in
whitehall
. Fixing this data removes a big part of the risk, as does fixing the code. Nonetheless, there are batches of assets we've looked at, whose states we cannot account for. We've started ideating around some remodelling work of the attachments/assets workflow, to make it simpler, easier to understand and maintain, and bug-free.Links
Trello card
Assets investigation document
Data extraction rake task
Assets we deleted
Co-authored-by @minhngocd