Clear local temporary metadata.yaml file before each download #342
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
What is this PR
Why is this PR needed?
I was alerted about this minor vulnerability by @sfmig.
The intended behaviour of our sample data module is to download the
metadata.yaml
anew each time, so that we can populate thepooch
registry based on the most recent contents of the data repository on GIN. The file is initially downloaded to a temporary filenametemp_metadata.yaml
, and if the download succeeds, this file is renamed tometadata.yaml
- thereby overwriting the existing local version ofmetadata.yaml
.The vulnerability could arise because if someone inadvertently creates a
temp_metadata.yaml
file inside the local cache folder, upon the next downloadpooch
will find the existing local temporary file, and will not replace it with the newest one on GIN (we compare no hashes for that file, sopooch
has no way of knowing if the file contents have been updated). This is a very remote possibility, but could happen.What does this PR do?
I extended an existing test to catch this case (by simulating the pre-existence of the local temporary file) and was able to confirm this vulnerability.
To fix it, I have modified the
_download_metadata_file()
function to check if thetemp_metadata.yaml
file exists and to remove it prior to attempting a fresh download. This will force the function to download the latestmetadata.yaml
from GIN every time. I confirmed that this fixed the aforementioned failing test (TDD ftw 😉).Alternative approach
We could also completely remove the need for
temp_metadata.yaml
to simplify things a bit, and make the code more readable. The simple logic would be:everytime
sample_data
is used, delete the oldmetadata.yaml
and replace it with a new one from GIN.The reason we hadn't gone with this approach to begin with, was my paranoia about the internet being down. But as @sfmig pointed out, if the internet is down, you may not be able to use the
sample_data
module in any case (unlesspooch
is still able to get you files from the local cache?) When the internet is back, you can get the newestmetadata.yaml
file again.How has this PR been tested?
See above.
Is this a breaking change?
No.
Does this PR require an update to the documentation?
No.
Checklist: