Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize ODF variable data type #89

Open
JessyBarrette opened this issue Apr 26, 2021 · 2 comments
Open

Standardize ODF variable data type #89

JessyBarrette opened this issue Apr 26, 2021 · 2 comments
Labels

Comments

@JessyBarrette
Copy link
Collaborator

JessyBarrette commented Apr 26, 2021

Issue

As of now, odf_parser convert each ODF variable to the data type associated with the variable under the TYPE ODF attribute.

The following mapping use applied:

odf_dtypes = {'DOUB': 'float64', 'SING': 'float32', 'DOUBLE': 'float64',
              'SYTM': str, 'INTE': 'int32', 'CHAR': str, 'QQQQ': 'int32'}

Once converted, those data types are then carried over to the NetCDF file. Since we're amalgamating many years of data there's a chance the data type associated with a variable may have changed over the years. This doesn't seem to create any issues with ERDDAP, but it may be more desirable in the future to standardize those data types for each variable.

Solution

Perhaps, the best would be to split the different variables into 5 categories, and convert each variable to the following data types:

  1. Data [double]: Any actual data (ex: TEMP_01, PSAL_01,...)
  2. Flag [integer]: It seems like most of the flag variables are using integer values between 0 to 9
  3. General Flags [integers]: (ex: QCFF, FFFF) those variables seem to have integer values but some bigger values associated to them.
  4. Time variables [double]: Any time variables gets encoded to seconds since 1970-01-01 00:00:00
  5. Strings [string]

Integer data type can't handle NaN values. Generally, each flag convention usually has a value describing a missing value. this specific value should be use as _FillValue.

@guillotp
Copy link
Collaborator

QCFF saves the state of the tests for the current record. Each test has a value with the form 2**n (like a byte). If the test failed, the value is added to the QCFF variable. That is why the value could be very low 2, or higher (more than 10000).
I do not know if this variable is very relevant ?

@guillotp
Copy link
Collaborator

The variable FFFF is the SeaBird flag (from the cnv file).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants