Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What even IS eo3? Fork proposal #301

Open
SpacemanPaul opened this issue May 2, 2023 · 7 comments
Open

What even IS eo3? Fork proposal #301

SpacemanPaul opened this issue May 2, 2023 · 7 comments

Comments

@SpacemanPaul
Copy link
Contributor

an EO3 document is a document that:

a) conforms with the (undocumented) metadata conventions established by eo-datasets; and
b) conforms to datacube-core's (undocumented) assumptions about the structure of eo3 dataset docs.

These are not always in agreement (i.e. datacube-core stores lineage internally in a different format to that output by eo-datasets.)

I propose splitting eo-datasets into two repositories:

  1. A new opendatacube/eo3 repository which defines, documents, validates, serialises and deserialises the attributes and properties of an EO3 document that are assumed internally by core and therefore need formal and strict definition;
  2. Leaving eo-datasets to define, document, and validate the metadata catalog for various collections and packaging conventions, and handle normalising and writing out according to various packaging conventions. These collections and packaging conventions can vary and diverge as required.

This split will facilitate:

a. Allow better sharing of code between (what is now) eo-datasets and core, e.g. as requested in #294.
b. Facilitate future extensions and updates to what core uses. e.g. CSIRO are looking into contributing ODC support for loading into multidimensional xarrays (e.g. for hyperspectral or climate modelling use cases)

@woodcockr
Copy link
Member

Whilst we are doing this suggest we look at some aspects of consistency with STAC

@SpacemanPaul
Copy link
Contributor Author

Whilst we are doing this suggest we look at some aspects of consistency with STAC

Leaving eo-datasets to define, document, and validate the metadata catalog for various collections and packaging conventions, and handle normalising and writing out according to various packaging conventions.

@Kirill888
Copy link
Member

https://odc-stac.readthedocs.io/en/latest/stac-vs-odc.html

About stac

@SpacemanPaul
Copy link
Contributor Author

There's also some metadata differences which I believe Rob encountered recently - e.g. STAC allows list of instruments, ODC flattens this list into a single comma-separated instrument value.

@woodcockr
Copy link
Member

Also ODC needs the product id and metadata id to do its references internally. Some other mostly minor but prohibitive tweaks. @Kirill888 I was looking at doing a PR into odc-stac eo3 but became uncertain after I found more minor differences, wasn't sure what "correct" was. I think this piece of work @SpacemanPaul is proposing with this issue will sort my end and I can work on a PR for odc-stac eo3 for ODC conversion to tidy this up.
FYI, I used odc-stac in this context because it handled stac extensions for projection nicely which resolved my metadata issue and because I think it's a good path forward in this space.

@Kirill888
Copy link
Member

@woodcockr, my understanding is that eo-datasets is all about data generation, both rasters and the accompanying metadata in "eo3 convention". There is actually very little overlap with odc-stac, I just linked that piece of documentation in response to your comment about stac vs odc comment.

As far as "what eo3 is" question? Would be good to have that properly defined, as I'm sure it has changed a lot over time. From "historical" context, "eo3" was all about capturing the following information about the underlying rasters

  1. Precise pixel shape and geo-referencing for all bands of a dataset
  2. Raster properties: dtype, nodata
  3. De-duplication of duplicated geo-referencing information that is present in eo

Information that was missing in "eo" and that was required for more "automatic" data loading behaviours in dc.load.

The equivalent STAC extensions are Projection (proposed by GA based on eo3) and Raster.

@SpacemanPaul
Copy link
Contributor Author

Work is underway: https://github.com/opendatacube/eo3

@SpacemanPaul SpacemanPaul moved this to Done in Datacube 1.9 Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants