Skip to content

Commit

Permalink
FAQ updates (#266)
Browse files Browse the repository at this point in the history
* faq question about already having kerchunked data

* note compatibility with icechunk

* move more basic usage questions to the bottom

* q about custom readers

* split API into User API and Developer API

* note about manifest classes
  • Loading branch information
TomNicholas authored Oct 22, 2024
1 parent 775c2c8 commit 534ae01
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 16 deletions.
41 changes: 25 additions & 16 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,13 @@ API Reference
.. currentmodule:: virtualizarr

VirtualiZarr has a small API surface, because most of the complexity is handled by xarray functions like ``xarray.concat`` and ``xarray.merge``.
Users can use xarray for every step apart from reading and serializing virtual references.

Manifests
=========

.. currentmodule:: virtualizarr.manifests
.. autosummary::
:nosignatures:
:toctree: generated/

ChunkManifest
ManifestArray

User API
========

Reading
=======
-------

.. currentmodule:: virtualizarr.backend
.. autosummary::
Expand All @@ -30,7 +22,7 @@ Reading


Serialization
=============
-------------

.. currentmodule:: virtualizarr.accessor
.. autosummary::
Expand All @@ -41,9 +33,8 @@ Serialization
VirtualiZarrDatasetAccessor.to_zarr
VirtualiZarrDatasetAccessor.to_icechunk


Rewriting
=============
---------

.. currentmodule:: virtualizarr.accessor
.. autosummary::
Expand All @@ -52,9 +43,27 @@ Rewriting

VirtualiZarrDatasetAccessor.rename_paths

Developer API
=============

If you want to write a new reader to create virtual references pointing to a custom file format, you will need to use VirtualiZarr's internal classes.

Manifests
---------

VirtualiZarr uses these classes to store virtual references internally.

.. currentmodule:: virtualizarr.manifests
.. autosummary::
:nosignatures:
:toctree: generated/

ChunkManifest
ManifestArray


Array API
=========
---------

VirtualiZarr's :py:class:`~virtualizarr.ManifestArray` objects support a limited subset of the Python Array API standard in :py:mod:`virtualizarr.manifests.array_api`.

Expand Down
28 changes: 28 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,31 @@ We have a lot of ideas, including:
- [Generating references without kerchunk](https://github.com/zarr-developers/VirtualiZarr/issues/78)

If you see other opportunities then we would love to hear your ideas!

## Is this compatible with Icechunk?

Yes! VirtualiZarr allows you to ingest data as virtual references and write those references into an Icechunk Store. See the [Icechunk documentation on creating virtaul datasets.](https://icechunk.io/icechunk-python/virtual/#creating-a-virtual-dataset-with-virtualizarr)

## I already have Kerchunked data, do I have to redo that work?

No - you can simply open the Kerchunk-formatted references you already have into VirtualiZarr directly. Then you can re-save them into a new format, e.g. [Icechunk](https://icechunk.io/) like so:

```python
from virtualizarr import open_virtual_dataset

vds = open_virtual_dataset('refs.json')
# vds = open_virtual_dataset('refs.parq') # kerchunk parquet files are supported too

vds.virtualize.to_icechunk(icechunkstore)
```

## Can I add a new reader for my custom file format?

There are a lot of legacy file formats which could potentially be represented as virtual zarr references (see [this issue](https://github.com/zarr-developers/VirtualiZarr/issues/218) for some examples). VirtualiZarr ships with some readers for common formats (e.g. netCDF/HDF5), but you may want to write your own reader for some other file format.

VirtualiZarr is designed in a way to make this as straightforward as possible. If you want to do this then [this comment](https://github.com/zarr-developers/VirtualiZarr/issues/262#issuecomment-2429968244
) will be helpful.

You can also use this approach to write a reader that starts from a kerchunk-formatted virtual references dict.

Currently if you want to call your new reader from `virtualizarr.open_virtual_dataset` you would need to open a PR to this repository, but we plan to generalize this system to allow 3rd party libraries to plug in via an entrypoint (see [issue #245](https://github.com/zarr-developers/VirtualiZarr/issues/245)).

0 comments on commit 534ae01

Please sign in to comment.