-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download files in fetch.txt #118
Comments
BTW, an ad-hoc solution without any checks etc is this bash one-liner: while read url size fpath;do mkdir -p "${fpath%/*}"; wget -O"$fpath" "$url";done < fetch.txt |
Hi @kba and @acdha I am cross-posting here from the issue @sevein referenced (visible above). archivematica/Issues#583. I also want support for the fetch.txt file, however I only need/want validation and not automatic downloading of the files. For my use case I have bags that contain (reference) TBs of data that are already in archive-quality, content-addressable storage. My team and I are happy to help contribute in reviews or code to get the validation functionality in at a minimum in lieu of fetching of the files. My feeling is that the default should be to validate and not fetch and rely on a parameter to cause a fetch to occur. |
Once the files have been downloaded, the regular bag validation process will handle it. We've been hesitant to put download support into bagit-python because it generally tends to get into a fair amount of code — people tend to ask for things like queuing, retries, concurrency controls, credentials & session management, storage management & cross-bag caching for identical files, etc. and have different opinions about what the answers to those look like. I think there's a fairly reasonable argument to finish #119 and basically tell people that if they need anything more advanced it's probably best to use whatever system they prefer and simply use bagit-python to validate the final results. |
Yes I agree with that. I am in support of only doing the validation (looking up a data file entry in fetch.txt if found in the manifest file) and not downloading anything.
But this does involve downloading the files. Doesn't this contradict with what you said above? |
I was just explaining why it hasn't happened before now. I do think there is a valid convenience argument for having a basic downloader for people who don't want anything fancy, however, so I'm open to accepting that pull-request as long as it doesn't get too complicated. |
Ok understood. The #119 PR doesn't seem to validate the contents of the fetch.txt with respect to the manifest, so that could be a separate PR to perform that task, correct? If so my team would be happen to contribute this. |
I think the idea is that we'd have a simple fetch function and then immediately call |
Right and the follow-up comment from @kba asserts the need to validate the fetch.txt file regardless if they are downloaded. Again for my use case, we don't want to download them simply for validation. So just to reiterate scope of this feature, there are two goals
Is this accurate? |
How would I go about completing an incomplete bag, which has files referenced in
fetch.txt
not present in/data
?Is this outside the domain of the tool or just not implemented? Or have I missed something?
If the latter, would this be an interesting feature for bagit-python or should we implement it on our side?
The text was updated successfully, but these errors were encountered: