-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check all available checksum algorithm in DataVerse registry population #437
base: main
Are you sure you want to change the base?
Conversation
Failures are unrelated. Should I look into fixing them? |
Hi @dokempf! Thanks for taking care of this. I don't think this fix solves the issue: with it the import pooch
example = pooch.create(
path=pooch.os_cache("example"),
base_url="doi:10.34894/5SOKTV",
)
example.load_registry_from_doi()
print(example.registry)
After inspecting the content of {
"id": 425558,
"persistentId": "",
"filename": "README.md",
"contentType": "text/markdown",
"friendlyType": "Markdown Text",
"filesize": 2324,
"storageIdentifier": "file://190251cc731-ca82ca1d341b",
"rootDataFileId": -1,
"checksum": {"type": "SHA-1", "value": "0e4b27fd3d76c75c37303f47e25d74ef407d0752"},
"tabularData": False,
"creationDate": "2024-06-17",
"publicationDate": "2024-06-28",
"fileAccessRequest": True,
} I suspect there might be a change in the DataVerse API and how they return information about the checksum. Therefore I think the fix should check if the On a side note, don't worry about the pylint complain, I'll fix it in another PR. We should replace it for some modern alternative ( |
My bad. I just saw that you added a link to the docs in the issue. Thanks for that! In the sample response they added in https://guides.dataverse.org/en/latest/api/native-api.html#import-a-dataset-into-a-dataverse-collection there's the So, my take would be:
|
BTW, I merged #438. If you update this branch you won't find the pylint error. |
So, I updated the code to include both of your recommendations (new API response, error throwing). Unfortunately, I do not reach the coverage target with this change, but mocking different API results seems quite an effort for little benefit. |
Restructure how the two APIs were being supported: the previous implementation would raise an error when "md5" and "checksum" were not available in the response, and not check for any other algorithms. Correctly parse the checksum algorithm that is available in the new Dataverse API: we need to convert "SHA-1" to "sha1" to name one.
Hi @dokempf. Sorry for the delay, these weeks were a little bit busy. I noticed that your implementation was raising an error when "md5" and "checksum" were not in the response. Even if "sha1" (for example) was there, the code would just raise an error because the first two if statements were false. I just pushed a change that would fix it. I also pushed a commit that adds some tests. I think we should include tests for this feature because of two reasons:
I know it's tedious to write these tests, but sometimes they are they only way to ensure things work as expected. I still would like to add some more things to the tests:
But at least for now, the following example works: import pooch
example = pooch.create(
path=pooch.os_cache("example"),
base_url="doi:10.34894/5SOKTV",
)
example.load_registry_from_doi()
print(example.registry)
example.fetch("README.md") |
"id": 12345, | ||
"filename": "foobar.txt", | ||
"checksum": { | ||
"type": self._algorithm.upper(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "SHA-1", "SHA-256", etc. Not just "SHA1" and "SHA256".
Fixes #435.