Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Kerchunk data sources #100

Open
zachsa opened this issue Jul 12, 2023 · 1 comment
Open

Support for Kerchunk data sources #100

zachsa opened this issue Jul 12, 2023 · 1 comment

Comments

@zachsa
Copy link

zachsa commented Jul 12, 2023

Hello,

Would it be feasible to support kerchunk-output as a data source?

As far as I understand, Kerchunk is a tool that provides a JSON-formatted breakdown of byte offsets of a NetCDF v4 file.
To illustrate, here's an example of a Kerchunk-generated JSON file.

Though I'm not very familiar with the technical aspects of Zarr directories or Kerchunk, I can see some similarities between the two. But it doesn't look like the Kerchunked JSON would be equivalent from a client/JavaScript perspective of a Zarr directly if I were to point to it, for example:

<Map>
  <Raster
    colormap={colormap}
    clim={[-20, 30]}
    source="https://mnemosyne.somisana.ac.za/somisana/algoa-bay/5-day-forecast/202307/20230712-hourly-avg-t3.kerchunk.json"
    variable={'temperature'}
    dimensions={['depth', 'y', 'x', 'time']}
    selector={{ depth: 200, time: 120 }}
  />
</Map>

Please let me know if this is already supported, or if not, whether this would be a simple/complex task. The benefit of supporting Kerchunk output rather than Zarrs directly is that it would save us around 1TB of space per year (assuming Zarrs are of a similar size to NetCDF v4 files) as we also need to store NetCDF files.

@katamartin
Copy link
Member

Thanks for the question @zachsa and apologies for the delay!

Briefly, this is not currently possible and extending to support NetCDF files seems pretty challenging.

This is because of (1) a @carbonplan/maps requirement that the data be prepared in multiscales format and (2) performant data fetching and rendering on the browser requiring access to relatively small chunk sizes (in ballpark of <10MB). It's possible to loosen (1) with some work, but (2) seems pretty insurmountable. However, we have been interested in exploring whether Kerchunk could allow us to visualize Cloud Optimized GeoTIFFs, whose pyramids should have compatibly sized chunks. For that to work, we would need to coerce the Kerchunk reference file to match our multiscales spec and use reference-spec-reader via a browser-based Zarr client (we're currently using zarr-js, where this would take some work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants