Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear local temporary metadata.yaml file before each download #342

Closed
wants to merge 2 commits into from

Conversation

niksirbi
Copy link
Member

@niksirbi niksirbi commented Nov 14, 2024

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
I was alerted about this minor vulnerability by @sfmig.

The intended behaviour of our sample data module is to download the metadata.yaml anew each time, so that we can populate the pooch registry based on the most recent contents of the data repository on GIN. The file is initially downloaded to a temporary filename temp_metadata.yaml, and if the download succeeds, this file is renamed to metadata.yaml - thereby overwriting the existing local version of metadata.yaml.

The vulnerability could arise because if someone inadvertently creates a temp_metadata.yaml file inside the local cache folder, upon the next download pooch will find the existing local temporary file, and will not replace it with the newest one on GIN (we compare no hashes for that file, so pooch has no way of knowing if the file contents have been updated). This is a very remote possibility, but could happen.

What does this PR do?
I extended an existing test to catch this case (by simulating the pre-existence of the local temporary file) and was able to confirm this vulnerability.

To fix it, I have modified the _download_metadata_file() function to check if the temp_metadata.yaml file exists and to remove it prior to attempting a fresh download. This will force the function to download the latest metadata.yaml from GIN every time. I confirmed that this fixed the aforementioned failing test (TDD ftw 😉).

Alternative approach
We could also completely remove the need for temp_metadata.yaml to simplify things a bit, and make the code more readable. The simple logic would be:

everytime sample_data is used, delete the old metadata.yaml and replace it with a new one from GIN.

The reason we hadn't gone with this approach to begin with, was my paranoia about the internet being down. But as @sfmig pointed out, if the internet is down, you may not be able to use the sample_data module in any case (unless pooch is still able to get you files from the local cache?) When the internet is back, you can get the newest metadata.yaml file again.

How has this PR been tested?

See above.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

No.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • [n/a] The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.78%. Comparing base (a3956c4) to head (536d743).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #342   +/-   ##
=======================================
  Coverage   99.77%   99.78%           
=======================================
  Files          14       14           
  Lines         907      911    +4     
=======================================
+ Hits          905      909    +4     
  Misses          2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@niksirbi
Copy link
Member Author

Closing this PR, because we determined that the downloading of the metadata.yaml file already works as intended.
The actual issue is that the warning emitted when the internet connection is down doesn't appear on the console. We will tackle that as part of #337.

@niksirbi niksirbi closed this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant