Skip to content
This repository has been archived by the owner on Apr 24, 2020. It is now read-only.

Use VLEN datatypes for Discrete Sampling Geometries #8

Open
MaartenSneepKNMI opened this issue Mar 3, 2015 · 5 comments
Open

Use VLEN datatypes for Discrete Sampling Geometries #8

MaartenSneepKNMI opened this issue Mar 3, 2015 · 5 comments

Comments

@MaartenSneepKNMI
Copy link

The section on discrete sampling geometries uses various methods to (efficiently) store datasets of varying length in linear arrays. NetCDF-4 introduces a VLEN data type, which seems to be quite elegant for this type of data. I suggest that we look into this.

@JohnLCaron
Copy link

Hi Maarten:

Its been high on my list to explore how the extended model can make point
data storage strategies better. The netcdf-3 DSG are compromises to
accomodate the limitations of the classic model.

One of the issues with VLEN at the HDF5 layer is that you have to read the
entire array (no subsetting or iteration). So its a good solution for small
arrays, but not a good solution for everything.

OTOH, I agree that for things like profiles that might have some variations
in the number of levels, it could be quite elegant.

I think we need to explore the performance of the various options. I have
seen large speedups for using structures. its possible vlen of structure or
structures with an unlimited dimension might give really good performance.
But until I can measure it, Im not sure what is compelling.

John

On Tue, Mar 3, 2015 at 9:31 AM, Maarten Sneep [email protected]
wrote:

The section on discrete sampling geometries uses various methods to
(efficiently) store datasets of varying length in linear arrays. NetCDF-4
introduces a VLEN data type, which seems to be quite elegant for this type
of data. I suggest that we look into this.


Reply to this email directly or view it on GitHub
#8.

@MaartenSneepKNMI
Copy link
Author

In Sentinel 5 precursor a VLEN datatype will be used for a variable where the number of data points per observation varies, but with a maximum of 32. This seems reasonable, especially since you'll need all of those for a particular observation anyway (subsetting is not meaningful there).

An option may be to use groups, where each DSG gets its own group. This probably reduces complexity, as a single group contains a single dataset, and subsetting is easily done.

@JohnLCaron
Copy link

Hi Maarten:

Can you expound on "each DSG gets its own group. This probably reduces
complexity, as a single group contains a single dataset, and subsetting is
easily done" ?

what do you mean by "each DSG" and "a single dataset" here ?

John

On Tue, Mar 31, 2015 at 4:53 AM, Maarten Sneep [email protected]
wrote:

In Sentinel 5 precursor a VLEN datatype will be used for a variable where
the number of data points per observation varies, but with a maximum of 32.
This seems reasonable, especially since you'll need all of those for a
particular observation anyway (subsetting is not meaningful there).

An option may be to use groups, where each DSG gets its own group. This
probably reduces complexity, as a single group contains a single dataset,
and subsetting is easily done.


Reply to this email directly or view it on GitHub
#8 (comment).

@MaartenSneepKNMI
Copy link
Author

In discrete sampling geometries you have a sequence of series of observations. Each series in the sequence has a different length.

The sequence is "everything you want to store"
The series is a set of related observations
And the observation is a single point (which may consist of multiple parameters).

You can try to store these in a single variable, but that requires some extra metadata where each series of observations starts, or the use of VLEN data (with noted objections for large series).

Another option is to use groups, with a new group to collect each series in the sequence, and perhaps a higher level group to combine the whole sequence. The various parameters can then each receive a separate variable.

@dblodgett-usgs
Copy link
Contributor

@MaartenSneepKNMI -- should we move this issue to the cf-conventions main issue list or do you think it could be archived?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants