Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating CMORization episode #313

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
80 changes: 56 additions & 24 deletions _episodes/09-cmorization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "CMORization: adding new datasets to ESMValTool"
teaching: 15
exercises: 45
compatibility: ESMValTool v2.6.0
compatibility: ESMValTool v2.11.0

questions:
- "CMORization: what is it and why do we need it?"
Expand Down Expand Up @@ -123,6 +123,12 @@ run the CMORizer scripts:
esmvaltool data format --config_file <path to config-user.yml> <dataset-name>
```

The options `--start` and `--end` can be added to command above to restrict the
formatting of raw data to a time range. They will be ignored if a specific
dataset does not support this option (i.e. because all the data is provided as a single file).
Valid formats are `YYYY`, `YYYYMM`, `YYYYMMDD`. The same way is also applicable for
the option `esmvaltool data download`.

The ``config-user.yml`` is the file in which we define the different data
paths, see the episode on [Configuration]({{ page.root }}{% link _episodes/03-configuration.md %}).
In the ``rootpath`` of your ``config-user.yml``, make sure to add the right
Expand All @@ -141,38 +147,52 @@ name that was created to store the raw observation data files, i.e.
If everything is okay, the output should look something like this:

~~~
...
... Starting the CMORization Tool at time: 2022-07-26 14:02:16 UTC
... Writing program log files to:
/scratch/b/b309059/esmvaltool_output/data_formatting_20240527_132448/run/main_log.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all references to b309059 be changed to username so it is less confusing and consistent with other episodes?

/scratch/b/b309059/esmvaltool_output/data_formatting_20240527_132448/run/main_log_debug.txt
... Starting the CMORization Tool at time: 2024-05-27 13:24:48 UTC
... ----------------------------------------------------------------------
... input_dir = /home/peter/data/RAWOBS
... output_dir = /home/peter/esmvaltool_output/data_formatting_20220726_140216
... input_dir = /work/bd0854/DATA/ESMValTool2/RAWOBS
... output_dir = /scratch/b/username/esmvaltool_output/data_formatting_20240527_132448
... ----------------------------------------------------------------------
... Running the CMORization scripts.
... Processing datasets ['FLUXCOM']
... Input data from: /home/peter/data/RAWOBS/Tier3/FLUXCOM
... Output will be written to: /home/peter/esmvaltool_output/
data_formatting_20220726_140216/Tier3/FLUXCOM
... Reformat script: /home/peter/mambaforge/envs/esmvaltool/lib/python3.9/
site-packages/esmvaltool/cmorizers/data/formatters/datasets/fluxcom
... CMORizing dataset FLUXCOM using Python script /home/peter/mambaforge/envs/
esmvaltool/lib/python3.9/site-packages/esmvaltool/cmorizers/data/formatters/
datasets/fluxcom.py
... Found input file '/home/peter/data/RAWOBS/Tier3/FLUXCOM/GPP.ANN.CRUNCEPv6.monthly.*.nc'
... Input data from: /work/bd0854/DATA/ESMValTool2/RAWOBS/Tier3/FLUXCOM
... Output will be written to: /scratch/b/b309059/esmvaltool_output/data_formatting_20240527_132448
/Tier3/FLUXCOM
... Reformat script: /home/b/b309059/ESMValTool/ESMValTool/esmvaltool/cmorizers/data/formatters/
datasets/fluxcom
... CMORizing dataset FLUXCOM using Python script /home/b/b309059/ESMValTool/ESMValTool/esmvaltool/
cmorizers/data/formatters/datasets/fluxcom.py
... Found input file '/work/bd0854/DATA/ESMValTool2/RAWOBS/Tier3/FLUXCOM/GPP.ANN.CRUNCEPv6.monthly.
*.nc'
... CMORizing variable 'gpp'
... Lmon
... Var is gpp
... ... UserWarning: Ignoring netCDF variable 'GPP' invalid units 'gC m-2 day-1'
... WARNING /work/bd0854/b309059/utils/mambaforge/envs/esmvaltool/lib/python3.11/site-packages/
iris/fileformats/_nc_load_rules/helpers.py:913: _WarnComboIgnoringCfLoad: Ignoring invalid u
nits 'gC m-2 day-1' on netCDF variable 'GPP'.
warnings.warn(

... Fixing time...
... Fixing latitude...
... Fixing longitude...
... Flipping dimensional coordinate latitude...
... Saving file
... Saving: /home/peter/esmvaltool_output/data_formatting_20220726_140216/Tier3/
FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
... Saving: /scratch/b/b309059/esmvaltool_output/data_formatting_20240527_132448/Tier3/FLUXCOM/
OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_198001-198012.nc
... Cube has lazy data [lazy is preferred]
... WARNING /work/bd0854/b309059/utils/mambaforge/envs/esmvaltool/lib/python3.11/site-packages/
iris/fileformats/netcdf/saver.py:2670: IrisDeprecation: Saving to netcdf with legacy-style a
ttribute handling for backwards compatibility.
This mode is deprecated since Iris 3.8, and will eventually be removed.
Please consider enabling the new split-attributes handling mode, by setting 'iris.FUTURE.
save_split_attrs = True'.
warn_deprecated(message)

... CMORization of dataset FLUXCOM finished!
... Formatting successful for dataset FLUXCOM

~~~
{: .output}

Expand All @@ -193,6 +213,12 @@ You can also see the path where ESMValTool stores the reformatting script:
have a look at this file if you want. The script also uses a configuration file:
`~/ESMValTool/esmvaltool/cmorizers/data/cmor_config/FLUXCOM.yml`.

To get help on CMORizer commands, run the tool with:

```bash
esmvaltool data --help
```

## Make a test recipe

To verify that the data is correctly CMORized, we will make a simple test
Expand Down Expand Up @@ -617,17 +643,23 @@ If we now run the test recipe on our newly 'CMORized' data,
esmvaltool run recipe_check_fluxcom.yml --config_file <path to config-user.yml> --log_level debug
```

it should be able to find the correct file, but it does not succeed yet. The first
thing that the ESMValTool CMOR checker brings up is:
it should be able to find the correct file, but it does not succeed yet. The ESMValTool CMOR checker
brings up is:

~~~
iris.exceptions.UnitConversionError: Cannot convert from unknown units. The
"units" attribute may be set directly.
esmvalcore.cmor.check.CMORCheckError: There were errors in variable GPP:
GPP: units should be kg m-2 s-1, not unknown
lon: standard_name should be longitude, not None
lat: standard_name should be latitude, not None
lon: units should be degrees_east, not unknown
lon: has values < valid_min = 0.0
lat: units should be degrees_north, not unknown
GPP: does not match coordinate rank
~~~
{: .error}

If you look closely at the error messages, you can see that this error concerns
the units of the coordinates. ESMValTool tries to fix them automatically,
If you look closely at the error messages, you can see the reasons for these errors
e.g. the units of the coordinates. ESMValTool tries to fix them automatically,
but since no units are defined on the coordinates, this fails.

The cmorizer utilities also include a function called `fix_coords`, but before
Expand Down Expand Up @@ -684,7 +716,7 @@ The next error is:

~~~
esmvalcore.cmor.check.CMORCheckError: There were errors in variable GPP:
Variable GPP units unknown can not be converted to kg m-2 s-1 in cube:
GPP: units should be kg m-2 s-1, not unknown
~~~
{: .error}

Expand Down
Loading