Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide curator with tooling to check whether there are files not backup to S3 glacier #2086

Open
4 tasks
pli888 opened this issue Nov 7, 2024 · 1 comment
Open
4 tasks

Comments

@pli888
Copy link
Member

pli888 commented Nov 7, 2024

User story

As a curator
I want to know the files differences between Wasabi and S3 Glacier
So that I know when there is a discrepancy between the two and can take appropriate action

Acceptance criteria

Given I have backed up files belonging to a dataset into AWS S3 Glacier bucket after they have been uploaded to Wasabi
When I continue to work on files
And I run a check tool
I am informed when a file has not been backed up yet

Additional Info

  • Add a boolean has_backup column to the file table.
  • when the transfer command is run with the --backup option (either manually or as part of automated backup), it will check for each file whether the transfer was successful and if so update the has_backup column for the corresponding file (using filesMetaDb new functionality below)
  • Implement filesMetaToDb --doi 102484 --mark-as-backed-up <filepath>
  • Implement filesMetaToDb --check that will check for files that are not backed up
@pli888 pli888 changed the title Enable curator to list contents of directories in S3 Glacier backup bucket Enable curator to list contents of directories in AWS S3 Glacier backup bucket Nov 7, 2024
@rija rija moved this to To Estimate in Backlog: GigaDB Database Nov 18, 2024
@rija rija changed the title Enable curator to list contents of directories in AWS S3 Glacier backup bucket Provide curator with tooling to check whether there are files not backup to S3 glacier Nov 19, 2024
@only1chunts
Copy link
Member

The DOI.md5 and DOI.filesizes files are not added to the database so what will happen with those?
Do we need to start adding them to the database?
What happens to other random files that make it into the Wasabi server somehow that are not listed in GigaDB, do they just get ignored? These should be flagged for investigation as to why they are there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants