Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The adapter should target a specific version of Open Targets data #27

Open
kpto opened this issue Nov 26, 2024 · 2 comments
Open

The adapter should target a specific version of Open Targets data #27

kpto opened this issue Nov 26, 2024 · 2 comments

Comments

@kpto
Copy link
Collaborator

kpto commented Nov 26, 2024

Unavoidably the adapter encodes the schema information of the Open Targets datasets for it to successfully transform Open Targets data to intermediate data. Unless the schema is frozen which I don't think it is or we make the adapter backward compatible to older versions which is a huge maintenance burden and does not seem useful to me, the adapter should clearly target a specific data version for reproducibility.

Verifying the input data version in the situation that the user prepared the data manually themselves could be a challenge though.

@slobentanzer
Copy link
Collaborator

slobentanzer commented Nov 26, 2024

Agree and this was the case so far (version is hardcoded in the scripts). Would probably be better to make the Open Targets version a (global) parameter of the adapter, such that it always refers to the correct version and, if the version is updated, the maintainer also needs to adjust any code that may need to be changed.

Not sure the maintenance burden is that high though, as the release cycle of Open Targets is quite slow. Edit: fully agree though that backwards compatibility (in one current codebase) is not warranted. We are automatically backwards-compatible if we version the repository appropriately, such that a user can go back to a previous Open Targets version. Not sure if we should consider syncing the pipeline version with the Open Targets version at some point.

Philosophically, this is exactly how BioCypher adapters should be designed: the adapter is the (only) component that has knowledge of the source data.

@kpto
Copy link
Collaborator Author

kpto commented Nov 26, 2024

@slobentanzer What's in my mind is that the targeted version, rather than just a notice, could also be a property of the Python project which is used during the build/publish process for automatically retrieving the dataset schemas from Open Targets platform for probably code generation.

Not sure the maintenance burden is that high though, as the release cycle of Open Targets is quite slow.

This one I was referring to keeping compatibility for multiple data versions in which case the cleanest way to do it is to keeping a copy of definitions for different data versions but even so it is a pollution that does not seem necessary to introduce.

Not sure if we should consider syncing the pipeline version with the Open Targets version at some point.

While not directly but I think the pipeline needs to sync with the adapter version because nodes properties are also defined in the BioCypher schema which is located inside the pipeline. We cannot guarantee the properties in the BioCypher schema matches the intermediate nodes produced by the adapter if the versions between a pipeline and an adapter is not synced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants