-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The adapter should target a specific version of Open Targets data #27
Comments
Agree and this was the case so far (version is hardcoded in the scripts). Would probably be better to make the Open Targets version a (global) parameter of the adapter, such that it always refers to the correct version and, if the version is updated, the maintainer also needs to adjust any code that may need to be changed. Not sure the maintenance burden is that high though, as the release cycle of Open Targets is quite slow. Edit: fully agree though that backwards compatibility (in one current codebase) is not warranted. We are automatically backwards-compatible if we version the repository appropriately, such that a user can go back to a previous Open Targets version. Not sure if we should consider syncing the pipeline version with the Open Targets version at some point. Philosophically, this is exactly how BioCypher adapters should be designed: the adapter is the (only) component that has knowledge of the source data. |
@slobentanzer What's in my mind is that the targeted version, rather than just a notice, could also be a property of the Python project which is used during the build/publish process for automatically retrieving the dataset schemas from Open Targets platform for probably code generation.
This one I was referring to keeping compatibility for multiple data versions in which case the cleanest way to do it is to keeping a copy of definitions for different data versions but even so it is a pollution that does not seem necessary to introduce.
While not directly but I think the pipeline needs to sync with the adapter version because nodes properties are also defined in the BioCypher schema which is located inside the pipeline. We cannot guarantee the properties in the BioCypher schema matches the intermediate nodes produced by the adapter if the versions between a pipeline and an adapter is not synced. |
Unavoidably the adapter encodes the schema information of the Open Targets datasets for it to successfully transform Open Targets data to intermediate data. Unless the schema is frozen which I don't think it is or we make the adapter backward compatible to older versions which is a huge maintenance burden and does not seem useful to me, the adapter should clearly target a specific data version for reproducibility.
Verifying the input data version in the situation that the user prepared the data manually themselves could be a challenge though.
The text was updated successfully, but these errors were encountered: