Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRTM versions used in the global-workflow #1453

Closed
emilyhcliu opened this issue Apr 12, 2023 · 28 comments
Closed

CRTM versions used in the global-workflow #1453

emilyhcliu opened this issue Apr 12, 2023 · 28 comments

Comments

@emilyhcliu
Copy link
Contributor

emilyhcliu commented Apr 12, 2023

Description

I spotted the following in the gfs.v16.3.5 global-workflow related to CRTM.

When building the global-workflow. I checked the GSI build log (build_gsi.log)

crtm/2.4.0 is loaded

Currently Loaded Modules:
  1) hpc/1.2.0              5) hpc-impi/2018.0.4   9) netcdf/4.7.4  13) ip/3.3.3     17) w3nco/2.4.1   21) crtm/2.4.0
  2) intel/18.0.5.274       6) cmake/3.20.1       10) bufr/11.7.0   14) sigio/2.3.2  18) nemsio/2.5.2  22) gsi_common
  3) hpc-intel/18.0.5.274   7) anaconda/2.3.0     11) w3emc/2.9.2   15) sfcio/1.4.1  19) wrf_io/1.2.0  23) prod_util/1.2.2
  4) impi/2018.0.4          8) hdf5/1.10.6        12) sp/2.3.3      16) bacio/2.4.1  20) ncio/1.0.0    24) gsi_hera.intel

Using crtm from the following hpc-stack

-- Found crtm: /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/crtm/2.4.0/lib/libcrtm.a (found version "2.4.0")

Compiler is using the module files under the following directory:

/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/crtm/2.4.0/include

However, in the global-workflow modulefiles directory:
module_base.hera.lua points to the following hpc stack:

module_base.hera.lua:prepend_path("MODULEPATH", "/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/stack")

The GSI log file (gdasanal.log) from the parallel experiment output has the following information about CRTM

crtm_ROOT=/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/intel-18.0.5.274/crtm/2.4.0
/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/compiler/intel/18.0.5.274/crtm/2.4.0.lua
CRTM_LIB=/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/intel-18.0.5.274/crtm/2.4.0/lib/libcrtm.a
CRTM_INC=/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/intel-18.0.5.274/crtm/2.4.0/include
RTMFIX=/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/intel-18.0.5.274/crtm/2.4.0/fix

It seems that we are using two different stacks: one is hpc-stack and the other is hpc-stack-gfsv16 in the same global-workflow.

Checking the CRTM time stamp of the two stacks:
hpc-stack-gfsv16:

ls -l /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/intel-18.0.5.274/crtm/2.4.0
drwxr-sr-x 2 Hang.Lei nwprod 131072 Apr 10 04:58 fix
drwxr-xr-x 2 Hang.Lei nwprod  20480 Apr 10 04:58 include
drwxr-sr-x 3 Hang.Lei nwprod   4096 Apr 10 04:58 lib

hpc-stack:

ls -l /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-18.0.5.274/crtm/2.4.0/
drwxr-sr-x 2 Hang.Lei nwprod 122880 Jun 22  2022 fix
drwxr-xr-x 2 Hang.Lei nwprod  20480 Jun 14  2022 include
drwxr-sr-x 3 Hang.Lei nwprod   4096 Jun 14  2022 lib

I checked with EIB and confirmed that the CRTM source code for hpc-stack-gfsv16 is the following:

/scratch2/NCEPDEV/nwprod/hpc-stack/src/develop/pkg/crtm-v2.4.0/

The CRTM_LifeCycle.f90 from the /scratch2/NCEPDEV/nwprod/hpc-stack/src/develop/pkg/crtm-v2.4.0, has the following print statement commented out:

604     ! Load the cloud coefficients
605     IF ( Local_Load_CloudCoeff ) THEN
606 !!$      WRITE(*, '("Load the cloud coefficients: ") ')
607 !!$      WRITE(*, '("...Cloud model: ", a) ') TRIM(Default_Cloud_Model)
608 !!$      WRITE(*, '("...CloudCoeff file: ", a) ') TRIM(Default_CloudCoeff_File)
609       err_stat = CRTM_CloudCoeff_Load( &
610                    Default_Cloud_Model                  , &
611                    Default_CloudCoeff_Format            , &
612                    Default_CloudCoeff_File              , &
613                    Quiet             = Quiet            , &
614                    Process_ID        = Process_ID       , &
615                    Output_Process_ID = Output_Process_ID  )
616       IF ( err_stat /= SUCCESS ) THEN
617         msg = 'Error loading CloudCoeff data from '//TRIM(Default_CloudCoeff_File)
618         CALL Display_Message( ROUTINE_NAME,TRIM(msg)//TRIM(pid_msg),err_stat )
619         RETURN
620       END IF
621     END IF

Please notice that the repeated print statments from lines 606-608 that we saw in the GSI output log file are from lines 606-608.
These print statments were commented out for hpc-stack-gfsv16

However, we still see the repeated print statments from lines 606-608 in our GSI log file after we recompiled the global-workflow with the updated hpc-stack-gfsv16:

Load the cloud coefficients:
...Cloud model: CRTM
...CloudCoeff file: ./crtm_coeffs/CloudCoeff.bin
Load the cloud coefficients:
...Cloud model: CRTM
...CloudCoeff file: ./crtm_coeffs/CloudCoeff.bin

This is the GSI log file after we re-compiled the global-workflow with the updated hpc-stack-gfsv16 on April 10, 2023.
We should not see the print statments in the log file since they were commented out in the source code.

So, I am wondering if this is because GSI in the global-workflow is pointed to the hpc-stack (updated on June 2022), not the hpc-stack-gfsv16 (the updated on April 10, 2023).

Question: In global-workflow, when the modules defined in each component (e.g. GSI, fv3gfs, post, ....etc) are different from the ones defined in the global-workflow modulefiles, the latter ones will be used to build the components. Am I correct?

Requirements

Acceptance Criteria (Definition of Done)

Dependencies

@RussTreadon-NOAA
Copy link
Contributor

Yes, @emilyhcliu , you are right.

When we clone and build gsi.x and enkf.x from within g-w, build script $HOMEgfs/sorc/build_gsi_enkf.sh invokes ./gsi_enkf.fd/ush/build.sh GSI-EnKF build script build.sh loads modules via

# Load modules
set +x
source $DIR_ROOT/ush/module-setup.sh
module use $DIR_ROOT/modulefiles
module load gsi_$MACHINE_ID
module list
set -x

where $DIR_ROOT=$HOMEgfs/sorc/gsi_enkf.fd.

We need to update modulefiles in $HOMEgfs/sorc/gsi_enkf.fd/modulefiles if we want to build gsi.x and enkf.x with hpc-stack-gfsv16.

@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Apr 12, 2023

@emilyhcliu The workflow level defines/forces the module/library versions (via the versions/*.ver files for GFSv16 ops in the dev/gfs.v16 branch) but the workflow doesn't, and to my knowledge won't, define/force the stack location MODULEUSE for the components (in any GFS version). So, as @RussTreadon-NOAA said, the GSI repo will need to update the MODULEUSE paths to use the special hpc-stack-gfsv16 install to match the workflow level. This is just for the GFSv16 GSI branch. The GSI develop should still be using the regular hpc-stack install and looking to move to the EPIC hpc-stack installs when able....and down the road spack-stack.

Would be good to get the MODULEUSE paths updated to hpc-stack-gfsv16 in the release branch that @ADCollard is prepping for an upcoming GSI update. Make sure to also update the Orion MODULEUSE if it's also not the hpc-stack-gfsv16 one.

The global-workflow issue #1356 will be documenting the updates for the upcoming ops GSI and obsproc upgrade and will be looking for a new tag from the GSI for that. Other than that, this seems like this should be a GSI issue and not a global-workflow issue. If you agree, please close this and open an issue in the GSI repo to resolve. Thanks!

@emilyhcliu
Copy link
Contributor Author

emilyhcliu commented Apr 17, 2023

@RussTreadon-NOAA I am going to change the modulefiles in release/gfsda.v16 from hpc-stack to hpc-stack-gfsv16 since hpc-stack-gfsv16 has the most updated CRTM-2.4.0_emc (from CRTM official site).

@emilyhcliu
Copy link
Contributor Author

emilyhcliu commented Apr 17, 2023

@Hang-Lei-NOAA @KateFriedman-NOAA @RussTreadon-NOAA
Questions about the hpc stacks on HERA.
There are two hpc stacks:
/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/
/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/
And both of these two hpc stacks have the following compiler versions:
intel-18.0.5.274
intel-2022.1.2
gnu-9.2.0 (only available for hpc-stack)

Q: @Hang-Lei updated the CRTM-2.4.0_emc (with the JCSDA official site) on April 10 for hpc-stack-gfsv16. So, I am going to update the release/gfsda.v16 modulefiles to use the hpc-stack-gfsv16. However, I am not sure if the other packages installed in hpc-stack-gfsv16 is good. Any comments?

@Hang-Lei-NOAA
Copy link
Contributor

@emilyhcliu As I previously introduced in the email, the /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/ was PASSed to EPIC for continue installation to support the community. Itself was frozen or not used for service. Therefore, any old usage on that could transfer to the EPIC installations. All libs information is posted on wiki https://github.com/NOAA-EMC/hpc-stack/wiki.

/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/ was originally installed for gfs16 only. It used the same installation procedure to the above one, but only for gfs16 required versions. It is continuously maintained by EMC. It should be fine.

@emilyhcliu
Copy link
Contributor Author

@aerorahul

I just had a discussion with @RussTreadon-NOAA about hpc stacks on the HPC machines.
There are two hpc stacks on HERA and ORION: hpc-stack and hpc-stack-gfsv16. They are not the same. hpc-stack is outdated and hpc-stack-gfsv16 is up-to-date.

The GSI develop version points to hpc-stack. The global-workflow/modulefiles points to hpc-stack-gfsv16.
Is it possible to sync hpc-stack with the hpc-stack-gfsv16?

@aerorahul
Copy link
Contributor

I am not familiar with the stack management.
I am aware that EPIC has started maintaining these stacks and we are about to switch to them imminently. @KateFriedman-NOAA could provide some info.

Is your q. related to running the develop of the global-workflow? I am quite sure, it is using a consistent stack.
If your q. is related to running the dev/gfsv16.x branch, I am not sure which stack is it pointing to.

@KateFriedman-NOAA
Copy link
Member

@emilyhcliu The current hpc-stack installs you're referring to are being moved away from. We will be using the following stacks moving forward:

  1. The hpc-stack-gfsv16 for the ops GFSv16 system (dev/gfs.v16 branch). If an update is needed to this hpc-stack install to support the GFSv16 then it should be done in these installs. I believe requests for that still go to @Hang-Lei-NOAA and the https://github.com/NOAA-EMC/hpc-stack repo.
  2. EPIC-maintained hpc-stacks for GFSv17 development system (develop branch) and then eventually spack-stack in the near future. See more info on the move to the EPIC hpc-stacks in Test new EPIC hpc-stack installs on R&Ds #1311 and related component issues (e.g. Updating hpc-stack modules and miniconda locations for Hera, Gaea, Cheyenne, Orion, Jet ufs-community/ufs-weather-model#1465). Requests for changes to these EPC hpc-stack installs also go through https://github.com/NOAA-EMC/hpc-stack.

The current hpc-stack installs will no longer be used soon and we shouldn't spend time updating them. The GSI needs to update to use the EPIC hpc-stack intel 2022 installs on supported platforms for now. When spack-stack is ready (being tested now) we will all move to those stacks.

Note: the above is just for the R&D platforms, WCOSS2 still only has the one production hpc-stack installation and is not yet EPIC or spack-stack.

Hope the above info helps!

@Hang-Lei-NOAA
Copy link
Contributor

Hang-Lei-NOAA commented Apr 21, 2023 via email

@emilyhcliu
Copy link
Contributor Author

emilyhcliu commented Apr 21, 2023

@emilyhcliu The current hpc-stack installs you're referring to are being moved away from. We will be using the following stacks moving forward:

  1. The hpc-stack-gfsv16 for the ops GFSv16 system (dev/gfs.v16 branch). If an update is needed to this hpc-stack install to support the GFSv16 then it should be done in these installs. I believe requests for that still go to @Hang-Lei-NOAA and the https://github.com/NOAA-EMC/hpc-stack repo.
  2. EPIC-maintained hpc-stacks for GFSv17 development system (develop branch) and then eventually spack-stack in the near future. See more info on the move to the EPIC hpc-stacks in Test new EPIC hpc-stack installs on R&Ds #1311 and related component issues (e.g. Updating hpc-stack modules and miniconda locations for Hera, Gaea, Cheyenne, Orion, Jet ufs-community/ufs-weather-model#1465). Requests for changes to these EPC hpc-stack installs also go through https://github.com/NOAA-EMC/hpc-stack.

The current hpc-stack installs will no longer be used soon and we shouldn't spend time updating them. The GSI needs to update to use the EPIC hpc-stack intel 2022 installs on supported platforms for now. When spack-stack is ready (being tested now) we will all move to those stacks.

Note: the above is just for the R&D platforms, WCOSS2 still only has the one production hpc-stack installation and is not yet EPIC or spack-stack.

Hope the above info helps!

@KateFriedman-NOAA Thanks. This is helpful.
Tagging @RussTreadon-NOAA and @CatherineThomas-NOAA for awareness.

@emilyhcliu
Copy link
Contributor Author

@KateFriedman-NOAA @Hang-Lei-NOAA @aerorahul
Thank you all for your input.

@KateFriedman-NOAA Your explanation is helpful, clear and thorough. I know how to modify GSI develop and the current release/gfsda.v16 to be consistent with the hpc stack maintenance.

@Hang-Lei-NOAA Thanks for helping me updating the module files on HERA.

@emilyhcliu
Copy link
Contributor Author

@RussTreadon-NOAA and @CatherineThomas-NOAA
Based on input from @KateFriedman-NOAA, I think we need to do the following:
Update the hpc stack to hpc-stack-gfsv16 for GSI develop and release/gfsda.v16 for HERA and ORION.

What do you think?

@RussTreadon-NOAA
Copy link
Contributor

Kate states

  • The hpc-stack-gfsv16 for the ops GFSv16 system.
  • EPIC-maintained hpc-stacks for GFSv17 development system (develop branch) and then eventually spack-stack in the near future.

GSI branch release/gfsda.v16 is where we maintain a snapshot of the operational GFSv16 GSI and EnKF. GFS v17 will take the GSI and EnKF from GSI develop. If this is correct it seems we need

  • GSI develop uses EPIC-maintain hpc-stacks and, later, spack-stack
  • GSI release/gfsda.v16 uses hpc-stack-gfsv16

This aligns with EIB & EPIC stack management, right?

@emilyhcliu
Copy link
Contributor Author

Kate states

  • The hpc-stack-gfsv16 for the ops GFSv16 system.
  • EPIC-maintained hpc-stacks for GFSv17 development system (develop branch) and then eventually spack-stack in the near future.

GSI branch release/gfsda.v16 is where we maintain a snapshot of the operational GFSv16 GSI and EnKF. GFS v17 will take the GSI and EnKF from GSI develop. If this is correct it seems we need

  • GSI develop uses EPIC-maintain hpc-stacks and, later, spack-stack
  • GSI release/gfsda.v16 uses hpc-stack-gfsv16

This aligns with EIB & EPIC stack management, right?

@KateFriedman-NOAA One question: Are the EPIC-maintained hpc-stacks still under development or ready for users?

@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Apr 21, 2023

One question: Are the EPIC-maintained hpc-stacks still under development or ready for users?

@emilyhcliu The EPIC-maintained hpc-stacks are ready for use on the R&Ds. If you find any issues in them or need to request changes/additions to the stacks then you will open an issue in the https://github.com/NOAA-EMC/hpc-stack repo. The EPIC folks will take care of fulfilling the request. Thus far they have been very helpful and responsive!

The spack-stack installs are still under development for use by the GFS/global-workflow.

@emilyhcliu
Copy link
Contributor Author

emilyhcliu commented Apr 21, 2023

One question: Are the EPIC-maintained hpc-stacks still under development or ready for users?

@emilyhcliu The EPIC-maintained hpc-stacks are ready for use on the R&Ds. If you find any issues in them or need to request changes/additions to the stacks then you will open an issue in the https://github.com/NOAA-EMC/hpc-stack repo. The EPIC folks will take care of fulfilling the request. Thus far they have been very helpful and responsive!

The spack-stack installs are still under development for use by the GFS/global-workflow.

@KateFriedman-NOAA Got it! Thanks for your confirmation.

@emilyhcliu
Copy link
Contributor Author

@RussTreadon-NOAA @CatherineThomas-NOAA
For release/gfsda.v16, we should point to hpc-stack-gfsv16
Shall we go ahead and test the GSI develop with the EPIC-maintained hpc-stacks?

@RussTreadon-NOAA
Copy link
Contributor

I thought we already ran some preliminary GFS v17 DA tests. Did these tests use EPIC stacks? I don't know.

@KateFriedman-NOAA
Copy link
Member

If this is correct it seems we need

  • GSI develop uses EPIC-maintain hpc-stacks and, later, spack-stack
  • GSI release/gfsda.v16 uses hpc-stack-gfsv16

This aligns with EIB & EPIC stack management, right?

@RussTreadon-NOAA This seems correct to me! :)

@RussTreadon-NOAA @emilyhcliu Related to all of this...one snag you are likely to hit is there aren't any intel 2018 versions of EPIC hpc-stacks on Hera or Orion. I believe there is an intel 2018 on Jet that @DavidHuber-NOAA used for the Jet port we just wrapped up (see his GSI Jet port work for how to move to EPIC hpc-stack). Not sure where the GSI is with moving to intel 2022 or if it's possible to get EPIC-maintained intel 2018 hpc-stacks on Hera/Orion. You could ask them.

I'm currently testing the full GFS with EPIC hpc-stacks but have left the GSI (gsi_enkf.fd) as-is while loading intel 2022 at runtime. It's working thus far and looks like we could move global-workflow to intel 2022 regardless of the GSI status on intel version. I will be putting an update in issue #1311 soon. I am hoping to wrap up this work before I go on leave. Wanted to let you know of this parallel effort.

FYI, this ufs-weather-model issue ufs-community/ufs-weather-model#1465 has one of the best lists of available EPIC hpc-stacks and how to load them in your system. Wanted to point that out again.

@emilyhcliu
Copy link
Contributor Author

I thought we already ran some preliminary GFS v17 DA tests. Did these tests use EPIC stacks? I don't know.

@RussTreadon-NOAA I am not aware of test done with the EPIC stacks on your side. @CatherineThomas-NOAA Do you know any?

@CatherineThomas-NOAA
Copy link
Contributor

@emilyhcliu @RussTreadon-NOAA
The v17-looking DA tests that have been completed thus far (for SDL) were using a v16 workflow and had the same problem that this issue outlines. We're currently planning our next suite of tests that would use v17-based workflows but are waiting until we get a few issues sorted.

I don't know of any DA tests intentionally testing the EPIC maintained stacks.

@RussTreadon-NOAA
Copy link
Contributor

Thanks @CatherineThomas-NOAA and @emilyhcliu for sharing where we are and where we are headed.

@emilyhcliu
Copy link
Contributor Author

@RussTreadon-NOAA and @KateFriedman-NOAA
I apologize for my ignorance about the issue of intel versions 2018 and 2022 for GSI.
GSI will encounter problems if it is compiled with intel 2022 version. For now, we need to use intel 2018?
Is my understanding correct?

@KateFriedman-NOAA
Copy link
Member

GSI will encounter problems if it is compiled with intel 2022 version. For now, we need to use intel 2018?

@emilyhcliu That's my understanding. The GSI will successfully compile with intel 2022 but there are issues at runtime after having been built with intel 2022. Other GSI folks should reconfirm that issue, I only understand it from the workflow side of things. This issue is a current blocker for at least one major global-workflow task = adding version files into the v17 system like was done for the v16 system and aligning all components to the same library versions via the version files from the workflow level.

@DavidHuber-NOAA
Copy link
Contributor

@emilyhcliu Yes, this is still the case. Regression tests all fail with Intel 2021+ when compiled with -O1+. I made some headway in resolving issues, but came to a dead end and the task has now been handed off to EPIC. The issue is tracked in NOAA-EMC/GSI#447.

@emilyhcliu
Copy link
Contributor Author

@KateFriedman-NOAA and @DavidHuber-NOAA Thanks for your explanations !

@KateFriedman-NOAA
Copy link
Member

@emilyhcliu can this be closed or is there still an issue to resolve? Thanks!

@WalterKolczynski-NOAA
Copy link
Contributor

OBE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants