The adapter should target a specific version of Open Targets data #27

kpto · 2024-11-26T15:15:27Z

Unavoidably the adapter encodes the schema information of the Open Targets datasets for it to successfully transform Open Targets data to intermediate data. Unless the schema is frozen which I don't think it is or we make the adapter backward compatible to older versions which is a huge maintenance burden and does not seem useful to me, the adapter should clearly target a specific data version for reproducibility.

Verifying the input data version in the situation that the user prepared the data manually themselves could be a challenge though.

slobentanzer · 2024-11-26T15:19:24Z

Agree and this was the case so far (version is hardcoded in the scripts). Would probably be better to make the Open Targets version a (global) parameter of the adapter, such that it always refers to the correct version and, if the version is updated, the maintainer also needs to adjust any code that may need to be changed.

Not sure the maintenance burden is that high though, as the release cycle of Open Targets is quite slow. Edit: fully agree though that backwards compatibility (in one current codebase) is not warranted. We are automatically backwards-compatible if we version the repository appropriately, such that a user can go back to a previous Open Targets version. Not sure if we should consider syncing the pipeline version with the Open Targets version at some point.

Philosophically, this is exactly how BioCypher adapters should be designed: the adapter is the (only) component that has knowledge of the source data.

kpto · 2024-11-26T15:37:46Z

@slobentanzer What's in my mind is that the targeted version, rather than just a notice, could also be a property of the Python project which is used during the build/publish process for automatically retrieving the dataset schemas from Open Targets platform for probably code generation.

Not sure the maintenance burden is that high though, as the release cycle of Open Targets is quite slow.

This one I was referring to keeping compatibility for multiple data versions in which case the cleanest way to do it is to keeping a copy of definitions for different data versions but even so it is a pollution that does not seem necessary to introduce.

Not sure if we should consider syncing the pipeline version with the Open Targets version at some point.

While not directly but I think the pipeline needs to sync with the adapter version because nodes properties are also defined in the BioCypher schema which is located inside the pipeline. We cannot guarantee the properties in the BioCypher schema matches the intermediate nodes produced by the adapter if the versions between a pipeline and an adapter is not synced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The adapter should target a specific version of Open Targets data #27

The adapter should target a specific version of Open Targets data #27

kpto commented Nov 26, 2024

slobentanzer commented Nov 26, 2024 •

edited

Loading

kpto commented Nov 26, 2024

The adapter should target a specific version of Open Targets data #27

The adapter should target a specific version of Open Targets data #27

Comments

kpto commented Nov 26, 2024

slobentanzer commented Nov 26, 2024 • edited Loading

kpto commented Nov 26, 2024

slobentanzer commented Nov 26, 2024 •

edited

Loading