Skip to content

Commit

Permalink
Merge pull request #519 from czender/csz_qnt
Browse files Browse the repository at this point in the history
Metadata to encode lossy compression (by quantization) with @cofinoa
  • Loading branch information
JonathanGregory authored Sep 6, 2024
2 parents 938472c + 9d3c007 commit 2539e66
Show file tree
Hide file tree
Showing 7 changed files with 194 additions and 7 deletions.
34 changes: 33 additions & 1 deletion appa.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ See <<appendix-grid-mappings>> for the grid mapping attributes, and <<appendix-m
The "Type" values are **S** for string, **N** for numeric, and **D** for the type of the data variable.
Each attribute may be used in any of the ways shown in its "Use" entry.
**G** indicates it can appear as a global attribute, and **Gr** as a group attribute.
For variable attributes, the possible values of "Use" are: **C** for variables containing coordinate data, **D** for data variables, **M** for geometry container variables, **Do** for domain variables, **BI** and **BO** for boundary variables (see <<cell-boundaries>> for the distinction between **BI** and **BO**), and **-** for variables with some other purpose.
For variable attributes, the possible values of "Use" are: **C** for variables containing coordinate data, **D** for data variables, **M** for geometry container variables, **Q** for quantization container variables, **Do** for domain variables, **BI** and **BO** for boundary variables (see <<cell-boundaries>> for the distinction between **BI** and **BO**), and **-** for variables with some other purpose.
CF does not prohibit any of these attributes from being attached to variables of different kinds from those listed as their "Use" in this table, but their meanings are not defined by CF if they are used in these other ways.
"Links" indicates the location of the attribute"s original definition (first link) and sections where the attribute is discussed in this document (additional links as necessary).

Expand Down Expand Up @@ -38,6 +38,12 @@ Attribute
If both **`scale_factor`** and **`add_offset`** attributes are present, the data are first scaled before the offset is added.
In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general.

| **`algorithm`**
| S
| Q
| <<quantization-variables>>, and <<quantization-algorithms-description>>
| Name of the quantization algorithm employed.

| **`ancillary_variables`**
| S
| D
Expand Down Expand Up @@ -200,6 +206,12 @@ Use in conjunction with **`flag_meanings`**.
| link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"]
| List of the applications that have modified the original data.

| **`implementation`**
| S
| Q
| <<quantization-variables>>, and <<quantization-algorithms-description>>
| The name and version of the library or client software that performed the quantization with **`algorithm`**.

| **`instance_dimension`**
| S
| -
Expand Down Expand Up @@ -300,6 +312,26 @@ Allowed for auxiliary coordinate variables but not allowed for coordinate variab
| <<COARDS>>
| Direction of increasing vertical coordinate value.

| **`quantization`**
| S
| D
| <<quantization-variables>>
| Identifies a variable that defines a quantization algorithm and its provenance.

| **`quantization_nsb`**
| N
| D
| <<per-variable-quantization-attributes>>, and <<quantization-algorithms-description>>
| Specifies the number of significant bits retained in the IEEE mantissa of data quantized with the BitRound algorithm.
Use in conjunction with **`quantization`**.

| **`quantization_nsd`**
| N
| D
| <<per-variable-quantization-attributes>>, and <<quantization-algorithms-description>>
| Specifies the number of significant base-10 digits retained in the IEEE mantissa of data quantized with base-10 quantization algorithms.
Use in conjunction with **`quantization`**.

| **`references`**
| S
| G, D
Expand Down
8 changes: 6 additions & 2 deletions bibliography.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@
[bibliography]
=== References

- [[[CFDM]]] link:$$https://doi.org/10.5194/gmd-10-4619-2017$$[A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)]. Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: _Geosci. Model Dev._, 10, 4619-4646, 2017.
- [[[COARDS]]] link:$$https://ferret.pmel.noaa.gov/Ferret/documentation/coards-netcdf-conventions$$[Conventions for the standardization of NetCDF Files].
Sponsored by the "Cooperative Ocean/Atmosphere Research Data Service," a NOAA/university cooperative for the sharing and distribution of global atmospheric and oceanographic research data sets. May 1995.
- [[[DCG19]]] link:$$https://doi.org/10.5194/gmd-12-4099-2019$$[Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files]. Delaunay, X., A. Courtois, and F. Gouillon: _Geosci. Model Dev._, 12, 4099-4113, 2019.
- [[[FGDC]]] link:$$https://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/base-metadata/v2_0698.pdf$$[Content Standard for Digital Geospatial Metadata].
Federal Geographic Data Committee, FGDC-STD-001-1998.
- [[[IEEE_754]]] link:$$https://doi.org/10.1109/IEEESTD.2019.8766229$$[IEEE Standard for Floating-Point Arithmetic], in _IEEE Std 754-2019 (Revision of IEEE 754-2008)_, 22 July 2019.
- [[[Kou21]]] link:$$https://doi.org/10.5194/gmd-14-377-2021$$[A note on precision-preserving compression of scientific data]. Kouznetsov, R.: _Geosci. Model Dev._, 14, 377-389, 2021.
- [[[KRD21]]] link:$$https://doi.org/10.1038/s43588-021-00156-2$$[Compressing atmospheric data into its real information content]. Klöwer, M., Razinger, M., Dominguez, J. J., Düben, P., and Palmer, T. N.: _Nat. Comput. Sci._, 1, 713-724, 2021.
- [[[NetCDF]]] link:$$https://doi.org/10.5065/D6H70CW6$$[NetCDF Software Package]. UNIDATA Program Center of the University Corporation for Atmospheric Research.
- [[[NUG]]] link:$$https://docs.unidata.ucar.edu/nug/current/index.html$$[The NetCDF User's Guide].
- [[[OGC_WKT-CRS]]] link:$$https://www.opengeospatial.org/standards/wkt-crs$$[OGC Well-known text representation of coordinate reference systems].
Expand All @@ -16,7 +20,7 @@ OGC document 12-063. 1st May 2015.
- [[[SCH02]]] link:$$https://doi.org/10.1175/1520-0493(2002)130<2459:ANTFVC>2.0.CO;2$$[A new terrain-following vertical coordinate formulation for atmospheric prediction models]. C Schaer, D Leuenberger, and O Fuhrer. 2002. _Monthly Weather Review_. 130. 2459-2480.
- [[[Snyder]]] link:$$https://doi.org/10.3133/pp1395$$[Map Projections: A Working Manual]. USGS Professional Paper 1395.
- [[[UDUNITS]]] link:$$https://doi.org/10.5065/D6KD1WN0$$[UDUNITS Software Package]. UNIDATA Program Center of the University Corporation for Atmospheric Research.
- [[[UGRID]]] link:$$https://ugrid-conventions.github.io/ugrid-conventions$$[UGRID Conventions for storing unstructured (or flexible mesh) data in netCDF files]
- [[[W3C]]] link:$$https://www.w3.org/$$[World Wide Web Consortium (W3C)].
- [[[XML]]] link:$$https://www.w3.org/TR/1998/REC-xml-19980210$$[Extensible Markup Language (XML) 1.0]. T. Bray, J. Paoli, and C.M. Sperberg-McQueen. 10 February 1998.
- [[[CFDM]]] link:$$https://doi.org/10.5194/gmd-10-4619-2017$$[A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)]. Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: _Geosci. Model Dev._, 10, 4619-4646, 2017.
- [[[UGRID]]] link:$$https://ugrid-conventions.github.io/ugrid-conventions$$[UGRID Conventions for storing unstructured (or flexible mesh) data in netCDF files]
- [[[Zen16]]] link:$$https://doi.org/10.5194/gmd-9-3199-2016$$[Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)]. Zender, C. S.: _Geosci. Model Dev._, 9, 3199-3211, 2016.
4 changes: 3 additions & 1 deletion ch01.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ out-of-group reference:: A reference to a variable or dimension that is not cont

path:: Paths must follow the UNIX style path convention and may begin with either a '/', '..', or a word.

quantization variable:: A variable used as a container for attributes that define a specific quantization algorithm. The type of the variable is arbitrary since it contains no data.

recommendation:: Recommendations in this convention are meant to provide advice that may be helpful for reducing common mistakes.
In some cases we have recommended rather than required particular attributes in order to maintain backwards compatibility with COARDS.
An application must not depend on a dataset's adherence to recommendations.
Expand Down Expand Up @@ -226,4 +228,4 @@ The UGRID conventions description is referenced from, rather than rewritten into
A summary indicating how UGRID relates to other parts of the CF conventions, and which features of UGRID are excluded from CF, can be found in <<mesh-topology-variables>>.
To reduce the chance of ambiguities arising from their accidental re-use, all of the UGRID standardized attributes are specified in <<appendix-mesh-topology-attributes>> and <<attribute-appendix>>.

The UGRID conventions have their own conformance document, which should be used in conjunction with the CF conformance document when checking the validity of datasets.
The UGRID conventions have their own conformance document, which should be used in conjunction with the CF conformance document when checking the validity of datasets.
Loading

0 comments on commit 2539e66

Please sign in to comment.