-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: checksum verification happens twice #918
Comments
Solution 1 looks ok to me. The only problem is that there would be no fixity check at all for uncompressed AIPs, which don't have pointer files. So the solution is not optimal for uncompressed AIPs, but in my opinion a premis fixity check event is not essential metadata for long-term preservation. |
If a PREMIS event describing checksum verification at the bag level is planned for an upcoming release, is there a ticket for that milestone that could be referenced in #918? It seems like an important part of understanding the context for removing the file level events. |
The issue describing the need for pointer files for uncompressed AIPs is described at artefactual/archivematica-storage-service#324. |
For QA, I looked at three things:
They all looked good, I think. I checked on all the platforms we support - Ubuntu xenial, bionic, CentOS, and rpms.
@jrwdunham is there anything else that needs to be tested? |
@ross-spencer redirecting my comment above to you! Feel free to pass on as appropriate. I think I'm good, just want to confirm. |
@sallain this looks good to me. The only thing on top of these questions I'd ask, is what a failure or negative result looks like, so I modified a SIP while it was still in the backlog (via the command line) and ended up with what looks like the right result: You might want to recreate this for your own satisfaction, otherwise, it looks |
Yay thanks @ross-spencer! That error looks right to me too! |
Added to release notes; I don't think this needs to be documented anywhere. |
Goal: improve the performance of Archivematica in processing large transfers (many files and/or large files).
Context: AM's global
checksum_type
setting can be set via the GUI at administration/general/ where the options are MD5, SHA-1, SHA-256, and SHA-512. On a standard AM install, the default value appears to be SHA-256.Overview: relevant micro-services in order of occurrence
Details
"Assign checksums and file sizes to objects" (Transfer)
updateSizeAndChecksum_v0.0
orarchivematicaUpdateSizeAndChecksum.py
.Files
table in the database; andchecksum_type
-named row in theDashboardSettings
table (which defaults to'sha256'
)."Verify checksums generated on ingest" (Ingest)
verifyPREMISChecksums_v0.0
orverifyPREMISChecksums.py
.'fixity check'
type event in the database to document that the checksum of the file made early on in transfer has not changed by the end of ingest."Prepare AIP" (Ingest)
bagit_v0.0
orarchivematicaBagWithEmptyDirectories.py
.bag create
which "creates a bag from supplied files/directories, completes the bag, and then writes in a specified format."bag
is taken from thechecksum_type
-named row in theDashboardSettings
table.'sha512'
is the default algorithm if there is nochecksum_type
-named row inDashboardSettings
table."Verify AIP" (Ingest)
verifyAIP_v0.0
orverifyAIP.py
.bag
(BagIt) CLI and callsbag verifypayloadmanifests
, which re-calculates checksums for all files in the AIP/bag and verifies that they match what is documented inmanifest-<ALGORITHM>.txt
, e.g.,manifest-sha256.txt
.Proposed Solution
bag
-generatedmanifest-<ALGORITHM>.txt
.Problem with Proposed Solution
The "Verify checksums generated on ingest" micro-service creates events in the db that must end up in the AIP METS. However, the "Verify AIP" micro-service necessarily occurs after the "Generate METS.xml document".
Solution 1 (preferred) to the Problem with Proposed Solution
"fixity check"
PREMIS:EVENT that it can pass to the Storage Service, which will document this verification (fixity check of the AIP as a whole) in the pointer file.Problems with this:
Solution 2 to the Problem with Proposed Solution
Instead of removing the "Verify checksums generated on ingest" micro-service, convert it to a once-per-unit type micro-service which simply optimistically creates the
'fixity check'
PREMIS events in the database and specifiesbag
as the tool used. Then, if "Verify AIP" ultimately fails, the AIP as a whole has failed so the inaccurate METS is not an issue.Problems with this:
'fixity check'
PREMIS:EVENTs in the METS file as there are currently. It would be good to reduce this number, if possible, so that the METS file is not so huge for large transfers and, correspondingly, so that the time needed to write the METS file is decreased.The text was updated successfully, but these errors were encountered: