-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report figshare data version in notebook output #44
Comments
@dhimmel and/or @cgreene Do you have any thoughts on the best way to handle versioning within the data loader? We currently use |
The data from figshare has versions. Therefore, it'd be ideal to specify a version and then download everything we need corresponding to that version. This is what machine-learning currently does. The What data is needed from GitHub? We should just upload that to figshare so it can use the common versioning system. |
@dhimmel / @gwaygenomics : is this complete? I think that the ml-workers appear to be downloading whatever the latest figshare version is. Does that get reported to the users? |
I don't think it does. I am not sure whether core-service is even storing which figshare version is loaded. The source code for downloading the data is: core-service/api/management/commands/acquiredata.py Lines 21 to 39 in b9b2e4f
So it's using the latest from GitHub for all files besides BTW the figshare has been downloaded 41,471 times. Either people are using this a lot (or more likely we're requesting it an insane number of times 😸 |
If we could reconstruct those URLs and put them into the notebook template, that's probably the best way. We'd like users to be able to reproduce the analysis and I think this key ingredient (the exact right data) is missing. |
Track which version of the data (figshare or cancer data sha) that was used for a classifer
The text was updated successfully, but these errors were encountered: