-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformant ZarrV3 codecs and fill values #193
Conformant ZarrV3 codecs and fill values #193
Conversation
virtualizarr/zarr.py
Outdated
elif self.dtype is np.dtype("int"): | ||
return 0 | ||
elif self.dtype is np.dtype("float"): | ||
return "NaN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the default fill value for float is 0.0:
import zarr
import json
store = zarr.store.MemoryStore(mode="w")
z = zarr.empty((1, 1), store=store)
z[:]
array([[0.]])
(I'm not sure where on the Array / Store / Other object that information lives.)
It'd be nice if zarr-python had this as a constant that we could reuse. Would that make sense, or is there some reason not to I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see a reference to NaN as a default at https://github.com/zarr-developers/VirtualiZarr/pull/193/files#diff-f5a7b84b3378d903e91ebf06f2db06dca5ad55d12e7c3bf8537a9b9bb1c4cfa0R361 (present on main).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the specification doesn't require a specific number, just that it not be null. See the note at the bottom of the fill_value
section https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#fill-value . However zarr-python
does default to 0 eventually and not NaN https://github.com/zarr-developers/zarr-python/blob/37a8441c20dae3b284803bb1b0d2e6c8f040fb3e/src/zarr/array.py#L231C9-L235C31 . I may have had some trouble with the unit tests, but I think it's better to be as similar as possible to zarr-python
, I'll change the defaults to 0s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could also be deferred to a later PR, especially if the true solution is to make it clearer what the default is upstream.
Co-authored-by: Tom Augspurger <[email protected]>
Co-authored-by: Tom Augspurger <[email protected]>
…algo3/VirtualiZarr into guhidalgo/fixmetadatacodecs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. One question about the default compressor in the new codecs
pipeline.
Thanks @t-mcneely for pairing on this with me! I'll write up the results of our PoC with GOES data. Sorry for the messy commit and revert, we accidentally merged Sean's HDF5 branch to make reading GOES HDF5 files work. |
Thanks for going into these weeds guys! This is amazing. Just a couple of extremely minor comments then I will merge it. |
Thanks @ghidalgo3 !! |
This change serializes the
ZArray
codec attributesfilters
,compressor
, andorder
according to the ZarrV3 specification, along with the default fill value of the array. This should hopefully allow a theoretical ZarrV3 reader to read a zarr store produced by VirtualiZarr, one day.I would consider this a breaking change because any existing VirtualiZarr stores that used the "before" format shown here would be unreadable now, but given that those metadata files were invalid zarrv3 anyway I'm not sure if it's worth handling that at possibility at read time. If you think it necessary, reviewer, I will implement it.
I'm also not too sure if adding the
transpose
andbytes
codec unconditionally is necessary. I think readers can assume that no transpose codec means no data transposition obviously, but it's unclear to me if the codec pipeline must declare either one of:Also, what happens if a source file uses a codec that is not one of the specified codecs of ZarrV3? Does that mean the file cannot be represented in ZarrV3? Seems rather onerous.
It may make sense to also change
_check_same_codecs
to use thecodec_pipeline
list declared in this PR.Before
After
docs/releases.rst
Issues:
compressor
type right? #94