-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodata is inconsistent after loading without explicit nodata #1646
Comments
We store metadata for a reason. If the geotiff metadata doesn't match the indexed metadata, I guess it's reasonable to go with the geotiff at load time, but we should at least raise a warning that the indexed metadata doesn't match the file. |
It'd be great it could go with the geotiff at load time if conflicts. From what I read in the code, it requires more changes to populate |
I guess another way is to "peek" the geotiff before loading, and deal with conflicts then instead of when loading actually happens. |
Yeah, it's not a quick/easy fix. Would also be hard to make it play nice with e.g. NetCDF reading and Zarr reading in 1.9. What geotiff are we talking about here? Why does it not match the metadata in the index? |
cuz it's not indexed...we need to retain some intermediate results in the process, but they don't have to be indexed. It's fairly easy to create a |
Except the band type and nodata values are wrong. Presumably this metadata is NOT included in the STAC item? THAT would seem to be the real issue. |
Nodata can only be populated from metadata, not from file data. File nodata is remapped to metadata nodata as different files can have inconsistent nodata set. With Dask arrays read from S3 we might not even have credentials to peek at nodata, as data will be accessed on a different machine with machine credentials, and that machine might not have been even started yet! |
the error on the linked line still exists though, as it should ONLY map |
I was planning to have a go splitting the loading code out of odc-stac into a separate repo later this month (after FOSS4G). |
Expected behaviour
nodata
information should be consistent between the data loaded andnodata
inxr.DataArray.attrs
Actual behaviour
nodata
information is missing inxr.DataArray.attrs
whenMeasurement.nodata=None
. However, in loading, thenodata
value is populated asnan
bydatacube-core/datacube/api/core.py
Line 975 in a410345
Measurement.dtype=float
, emits error whenMeasurement.dtype=int
.nodata
anddtype
fromgeotiff
is overwritten in the case whereMeasurement
doesn't match metadata fromgeotiff
. E.g.,dtype=int
andnodata=255
in geotiff, whiledtype=float
andnodata=None
inMeaurement
, the loaded data dtype is casted intofloat
and255
is substituted withnan
. It is reasonable to choose one over another when there is conflict, though the non-presence ofnodata
inDataArray.attrs
causes confusion as the other choice ofgeotiff
metadata overMeasurement
is also plausible.Steps to reproduce the behaviour
Dataset
fromstac
instead of odc queryDataset
whose measurement will have defaultdtype=float
andnodata=None
Environment information
datacube == 1.8.19
The text was updated successfully, but these errors were encountered: