Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards compatibility for reading Array.filters and Array.codecs #2194

Open
TomAugspurger opened this issue Sep 17, 2024 · 3 comments
Open
Labels
bug Potential issues with the zarr-python library

Comments

@TomAugspurger
Copy link
Contributor

Zarr version

v3

Numcodecs version

n/a

Python Version

n/a

Operating System

n/a

Installation

n/a

Description

As part of getting xarray ready for zarr v3, I'm looking at how to handle the codec and filter API.

The primary / first place this is accessed is https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L555-L556, which just reads the values of .filters and .compressor to place them in the DataArray.encoding. A few questions:

  1. I'd like to add a .codecs property to the CodecPipeline ABC. This is fine for the BatchedCodecPipeline which AFAICT is the only actual codec pipeline. Does anyone foresee an issue with that? I'm not sure why that class is abstract and loadable through the config.
  2. Is it fair to say that filters is the same as array_array_codecs?
  3. Is it fair to say that compressor is the same as array_bytes_codecs?

There's also https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L79, which accesses Codec.codec_id. I'm not sure yet about how to handle that, but right now the best is maybe .to_dict()["name"] (or we could have .to_dict() access a code_id)?

Steps to reproduce

n/a

Additional output

No response

@TomAugspurger TomAugspurger added the bug Potential issues with the zarr-python library label Sep 17, 2024
@d-v-b
Copy link
Contributor

d-v-b commented Sep 17, 2024

Is it fair to say that filters is the same as array_array_codecs?
Is it fair to say that compressor is the same as array_bytes_codecs?

Unfortunately, no. This was discussed in an earlier discussion here: #1944.

There's also https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L79, which accesses Codec.codec_id. I'm not sure yet about how to handle that, but right now the best is maybe .to_dict()["name"] (or we could have .to_dict() access a code_id)?

Probably the place to look for this functionality would be the classes that adapt the v2 compressor / filters to the v3 codecs api:

class V2Filters(ArrayArrayCodec):

In v3 filters / codecs are stored as dicts, but in #2179 I switch to storing instances of numcodecs.abc.Codec, which i think would permit re-using the old object inspection code?

@d-v-b
Copy link
Contributor

d-v-b commented Sep 17, 2024

(xref to the issue tracking the top-level codecs / filters / compressor api): #1943

@TomAugspurger
Copy link
Contributor Author

Thanks for those links. I'll try to digest them and will propose a plan that'll either be compatibility code, or (more likely?) a PR to the migration guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

2 participants