Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable derivation for ERA5 on-the-fly CMORizer #1806

Open
axel-lauer opened this issue Nov 18, 2022 · 17 comments · May be fixed by #2551
Open

Variable derivation for ERA5 on-the-fly CMORizer #1806

axel-lauer opened this issue Nov 18, 2022 · 17 comments · May be fixed by #2551
Labels
fix for dataset Related to dataset-specific fix files observations question Further information is requested variable derivation Related to variable derivation functions

Comments

@axel-lauer
Copy link
Contributor

I ran into the problem of not being able to figure out how to derive variables (using more than 1 input variable) with the ERA5 on-the-fly CMORizer. In this concrete case, I would like to calculate the CMOR variable rsut (TOA Outgoing Shortwave Radiation), which is not readily available from ERA5. rsut needs to be calculated from the available ERA5 variables as

rsut = era5_toa_incident_solar_radiation - era5_top_net_solar_radiation

The problem is (as it seems) that it is not possible to have 2 different variables at the time the functions from cmor/_fixes/native6/era5.py are called.

I guess one possible solution could be to have all variables needed for the variable derivation in the ERA5 "raw" files and then follow a similar approach as @schlunma implemented for the on-the-fly CMORizer for EMAC. But I am not sure this is supported by era5cli we use to download ERA5 data and would probably make obtaining the ERA5 data more complicated.

@ESMValGroup/esmvaltool-coreteam would you have an idea how to address the variable derivation problem for ERA5 data?

@axel-lauer axel-lauer added question Further information is requested fix for dataset Related to dataset-specific fix files variable derivation Related to variable derivation functions observations labels Nov 18, 2022
@bouweandela
Copy link
Member

Are era5_toa_incident_solar_radiation and era5_top_net_solar_radiation available in a cmor table? If they are, the usual derivation can be used because the derive preprocessor function is applied after fixes

@axel-lauer
Copy link
Contributor Author

@bouweandela Excellent idea! I believe the two variables in question are in a CMOR table. I'll give that a try and then report back.

@remi-kazeroni
Copy link
Contributor

If the solution proposed by @bouweandela does not work, you may be hitting #1388. There is still an issue regarding the derivation of some custom ERA5 variables, such as rsus and rlus (see ESMValGroup/ESMValTool#2396)

@axel-lauer
Copy link
Contributor Author

I tried the method proposed by @bouweandela, unfortunately with limited success. I succeeded in implementing derivation scripts for rsut and rsutcs that calculate these variables from the existing ERA5 variables. I did not, however, succeed in calculating a variable derived from rsut, rsutcs (in this case: swcre). When trying to do so, the preprocessor fails due to missing data. As deriving rsut and rsutcs individually works, I guess the problem might be this kind of “double-derivation”.

I think the cleanest way would be to add support for variable derivation with more than one input variable directly to the on-the-fly cmorizer for ERA5. That would also make defining new CMOR tables specifically tailored to ERA5 variables needed for the variable derivation obsolete. Any thoughts on this?

@bouweandela
Copy link
Member

bouweandela commented Nov 22, 2022

Indeed "double-derivation" is not supported, but I also suspect it is not needed. If you can do it in two steps, it should also be possible to do it in a single step. Which input variables do have data available and what are the derivation formulas?

@axel-lauer
Copy link
Contributor Author

I am not sure I can picture how to combine those two derivation steps without hard-coding the special ERA5 case in the combined derivation script. I would find such hard-coding somewhat undesirable. If you have any ideas, I would be very happy to hear more about this.

@bouweandela
Copy link
Member

For my understanding, could you please provide:

  • which input variables are provided by ERA5 (ERA5 name and CMIP6 name if available)
  • the names of the variables that you want to derive and what formula can be used to derive those from the input?

@axel-lauer
Copy link
Contributor Author

ERA5 input variables

ERA5 variable CMOR variable Description
mtnswrf n/a (custom variable needed) TOA net upward solar radiation
mtnswrfcs n/a (custom variable needed) TOA net upward solar radiation
tisr dsdt TOA incident solar radiation

Step1: CMOR variables to be derived from ERA5 data for calculation of swcre

CMOR variable Formula Description
rsut tisr - mtnswrf TOA outgoing shortwave flux
rsutcs tisr - mtnswrfcs TOA outgoing clear-sky shortwave flux

Step 2: target variable to be derived

Target variable Formula Description
swcre swcre = rsutcs - rsut TOA shortwave cloud radiative effect

In case of ERA5 this could also be calculated directly as:

swcre =mtnswrf - mtnswrfcs

_derive/swcre.py already exists, which calculates swcre as rsutcs - rsut. I am therefore not sure how to implement the direct calculation of swcre from the ERA5 variables without hard-coding an "ERA5 case" in swcre.py.

@bouweandela
Copy link
Member

Thanks for explaining, I think I get it now.

I guess one possible solution could be to have all variables needed for the variable derivation in the ERA5 "raw" files and then follow a similar approach as @schlunma implemented for the on-the-fly CMORizer for EMAC.

It looks like this would be the most convenient solution because if it is done by computing rsutcs - rsut you will need to download an additional variable tisr which is not needed at all for the computation.

But I am not sure this is supported by era5cli we use to download ERA5 data and would probably make obtaining the ERA5 data more complicated.

It's not supported by era5cli, but you could download the data for mtnswrf and mtnswrfcs using the command:

era5cli monthly --variables mean_top_net_short_wave_radiation_flux mean_top_net_short_wave_radiation_flux_clear_sky

and put the files in a directory called Tier3/ERA5/1/mon/swcre when using the default DRS Tier{tier}/{dataset}/{version}/{frequency}/{short_name}. That is indeed a bit more complicated, but not very.

@axel-lauer
Copy link
Contributor Author

I am afraid I am not sure how this approach could work without additional processing steps outside of the tool. As far as I understand, the two ERA5 variables mtnswrf and mtnswrfcs need to be in the same file. I did this using ncks:

ncks -A download/era5_mean_top_net_short_wave_radiation_flux_2001_monthly.nc -o merged_2001.nc
ncks -A download/era5_mean_top_net_short_wave_radiation_flux_clear_sky_2001_monthly.nc -o merged_2001.nc

I then placed these files into RAWOBS/Tier3/ERA5/v1/mon/swcre.

This procedure works with the swcre class defined in this modified ERA5 fix: https://github.com/ESMValGroup/ESMValCore/blob/extend_era5_fix/esmvalcore/cmor/_fixes/native6/era5.py

Here is the test recipe is used: recipe_test_era5.yml.txt

But even with this method, I do not know how to avoid having to do this kind of processing the ERA5 files downloaded with era5cli. As this processing happens outside of the tool, this this seems very cumbersome and quite confusing from a user's point of view. Would you possibly have any ideas or suggestions? @schlunma would you know what to do or how to improve this?

@schlunma
Copy link
Contributor

Given that the input file DRS for native6 is just *.nc, the tool will find and use any nc file that's in the Tier3/ERA5/{version}/{frequency}/swcre directory. The way iris.load works (concatenate files as far as possible, if multiple files remain just use a CubeList instead of a single Cube) all these files (and thus all the variables) should end up in the cubes argument that is passed to the fix.

Long story short, I think this preprocessing with ncks is not necessary; just put all downloaded files in the correct directory. Iris should figure out the rest (in theory) 😄

@axel-lauer
Copy link
Contributor Author

Thanks for your quick reply @schlunma . I tried this before and this approach failed with this error message:

Traceback (most recent call last):
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_main.py", line 486, in run
    fire.Fire(ESMValTool())
  File "/work/bd0854/b380103/mambaforge/envs/esmvaltool27/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/work/bd0854/b380103/mambaforge/envs/esmvaltool27/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/work/bd0854/b380103/mambaforge/envs/esmvaltool27/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_main.py", line 393, in run
    self._run(recipe, session)
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_main.py", line 433, in _run
    process_recipe(recipe_file=recipe, session=session)
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_main.py", line 127, in process_recipe
    recipe.run()
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_recipe/recipe.py", line 1900, in run
    self.tasks.run(max_parallel_tasks=self._cfg['max_parallel_tasks'])
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_task.py", line 722, in run
    self._run_sequential()
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_task.py", line 733, in _run_sequential
    task.run()
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_task.py", line 258, in run
    input_files.extend(task.run())
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/_task.py", line 262, in run
    self.output_files = self._run(input_files)
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/preprocessor/__init__.py", line 643, in _run
    product.apply(step, self.debug)
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/preprocessor/__init__.py", line 430, in apply
    self.cubes = preprocess(self.cubes, step,
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/preprocessor/__init__.py", line 362, in preprocess
    result.append(_run_preproc_function(function, items, settings,
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/preprocessor/__init__.py", line 322, in _run_preproc_function
    return function(items, **kwargs)
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/cmor/fix.py", line 114, in fix_metadata
    cube_list = fix.fix_metadata(cube_list)
  File "/work/bd0854/b380103/ESMValCore/esmvalcore/cmor/_fixes/native6/era5.py", line 412, in fix_metadata
    cubes.extract_cube(NameConstraint(var_name='mtnswrfcs'))
  File "/work/bd0854/b380103/mambaforge/envs/esmvaltool27/lib/python3.10/site-packages/iris/cube.py", line 267, in extract_cube
    return self._extract_and_merge(
  File "/work/bd0854/b380103/mambaforge/envs/esmvaltool27/lib/python3.10/site-packages/iris/cube.py", line 314, in _extract_and_merge
    raise iris.exceptions.ConstraintMismatchError(msg)
iris.exceptions.ConstraintMismatchError: Got 0 cubes for constraint NameConstraint(var_name='mtnswrfcs'), expecting 1.

Am I missing something or did I implement the class "Swcre" in era5.py in a wrong way?

@schlunma
Copy link
Contributor

Hmm...then apparently the theory is wrong 🤦 Could you send the full debug log?

@axel-lauer
Copy link
Contributor Author

Here is the full debug log: main_log_debug.txt

@schlunma
Copy link
Contributor

Found the problem. The tool indeed only runs fix_metadata on single files:

https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/cmor/fix.py#L108-L114

The input argument is in fact a CubeList, but it is sorted by files beforehand...not sure how we could change that, if fix_metadata is always run on all input files we get into huge trouble with datasets that are spread over multiple input files...

@schlunma
Copy link
Contributor

schlunma commented Oct 9, 2024

I'll try to work on this in the next couple of days 👍

@schlunma
Copy link
Contributor

Draft PR open here: #2551, will move any discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix for dataset Related to dataset-specific fix files observations question Further information is requested variable derivation Related to variable derivation functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants