-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maybe Refactor for CodecPipeline
+ Store
#13
Comments
CodecPipeline
CodecPipeline
+ Store
This would also have the side-effect of getting rid of the horrifying fork.... |
Another thing about this - falling back to a "working" implementation. Right now it's very easy so the library just "works" whether or not the rust codepath can be used but I'm not sure how we would fall back to a different codec...maybe instantiating a different pipeline within our pipeline and then calling it conditionally? |
+1. This seems like a good direction. It looks like subbing in a Rust codec backend would just be a matter of changing the default codec pipeline.
Scanning over the Also, |
Although, what you have at the moment is probably the best way to squeeze performance out of |
I looked into this a bit...unfortunately Here's a little code snipit: import zarr
import numpy as np
arr = zarr.Array.create(await zarr.store.LocalStore.open('foo.zarr', mode="w"), shape=(100,), chunks=(10,), dtype="i4", codecs=[zarr.codecs.BytesCodec(), zarr.codecs.BloscCodec()])
arr[:] = np.arange(100)
arr._async_array.codec_pipeline
(arr._async_array.store_path / "c/0").get I don't see anything about metadata above, and the pipelines are called with the paths attached via the chunk encodings: i.e., |
Outputs
That looks to be enough to create a |
Definitely! Implicit to that comment was wanting to keep the |
Sounds optimal! Since all I/O and encoding/decoding seems to happen under the Python |
@LDeakin 100% agree. It should be the same rust API, all that changes is the python (in theory). I think we just need to make sure we can decode the chunk key in the pipeline (instead of relying on the file path). |
A good idea from Norman: https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/zarr-python.20v.20tensorstore.20benchmark.3A.20sharded.20.20read.20.2F.20writes/near/447185735 to use https://github.com/zarr-developers/zarr-python/blob/5ca080d0fd21dbb2b6a8b101cc99d86de5039f08/src/zarr/codecs/pipeline.py#L59 as the entry point for rust
I'm not sure exactly how to go about this since it wasn't documented really at the time of my creating this library (although it seems a sprint is coming soon: zarr-developers/zarr-python#2215) so I went with the thing that looked the most familiar in the absence of something clearer. I don't think much would have to change since the arguments to the rust interface should basically be identical but could be a simpler way to integrate...although reopening the array so rust has the right metadata could be tricky. I don't think
Store
really handle opening arrays so I don't know how this would actually help with the metadata issue...but maybe there's a trick here I'm missingThe text was updated successfully, but these errors were encountered: