Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal refactor to separate reading and writing concerns #231

Merged
merged 19 commits into from
Aug 27, 2024

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Aug 26, 2024

This PR reorganizes basically the whole repo without actually changing any behaviour or public API. I've literally just copied and pasted code, created, renamed and moved files, and edited imports to match. The point is to split up various things that should be thought of separately:

  • xarray.py is split into backend.py (which contains open_virtual_dataset for reading things) and accessor.py (which contains the .virtualize accessor for serializing things). Sean's hdf5 reader will go in here too (see Non-kerchunk backend for HDF5/netcdf4 files. #87).
  • kerchunk is now treated as one amongst many "readers", and split out in the same way that the dmrpp code is. (Same for the tests too).
  • Similarly we now have a concept of "writers", which currently are just kerchunk (i.e. write to kerchunk json / parquet) and zarr v3 chunk manifest (which only exists as a proof-of-principle, but we know other writers might go in here later).
  • The original kerchunk.py file has now got so little in it that it only contains type definitions (which really should be defined upstream...), so has been moved to a new virtualizarr.types.kerchunk module.

These changes should make the structure of the codebase clearer, and especially the structure of the relationship to kerchunk code. i.e. it should now be clearer that kerchunk is one amongst many virtualizarr readers, and one amongst many virtualizarr writers, but not actually required for either.

@TomNicholas TomNicholas changed the title Internal refactor Internal refactor to separate reading and writing concerns Aug 27, 2024
@TomNicholas TomNicholas merged commit 515d157 into main Aug 27, 2024
8 checks passed
@TomNicholas TomNicholas deleted the reader-writer-refactor branch August 27, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor reader / IO organization internally
1 participant