Skip to content

Conference call notes 20231011

Kenneth Hoste edited this page Oct 11, 2023 · 7 revisions

(back to Conference calls)

Notes on the 231th EasyBuild conference call, Wednesday 11 Oct 2023 (08:00 UTC)

Attendees

List of attendees (9):

  • Massimiliano Culpo (Spack core developer)
  • Jasper Grimm (University of York, UK)
  • Kenneth Hoste (HPC-UGent, Belgium)
  • Adam Huffman (Big Data Institute, Oxford, UK)
  • Kurt Lust (UAntwerpen, Belgium + LUMI User Support Team)
  • Sebastien Moretti (SIB, Switzerland)
  • Mikael Öhman (Chalmers University of Technology, Sweden)
  • Åke Sandgren (Umeå University, Sweden)
  • Jörg Saßmannshausen (Imperial College London, UK)

Agenda

  • overview of recent developments
  • Q&A

Recent developments

  • latest EasyBuild release: 4.8.1 (11 Sept 2023)
    • ETA for next EasyBuild release: end of Oct'23
    • ETA for EasyBuild 5.0 release: by the end of 2023 (?)
      • started doing short sprint meetings, each Monday at 10:00 CEST to set next 5 goals to tackle that week
  • easyconfigs merge sprint
    • planned for Mon 23 Oct'23
  • recent changes
    • docs (merged PRs)
      • ...
    • framework (merged PRs)
      • bug fixes
        • ...
      • enhancements
        • ...
      • changes
        • reduce number of CI jobs by testing for Lua and Tcl module syntax in a single CI job (PR #4192)
      • EasyBuild 5.0 (to 5.0.x branch)
        • ...
    • easyblocks (merged PRs)
      • bug fixes
        • only use -DCMAKE_SKIP_RPATH=ON for CMake < 3.5.0 (PR #3012)
      • enhancements
        • use more test programs in sanity check step of OpenMPI easyblock (PR #3016)
      • updates
        • ...
      • changes
        • ...
      • new easyblocks
        • ...
      • EasyBuild 5.0 (to 5.0.x branch)
        • update version of config.guess used by ConfigureMake (PR #3013)
        • fix test suite: stop testing with Python 3.5 and Lmod 6.x, stop using toolchain.DUMMY (PR #3014)
    • easyconfigs (merged PRs)
      • over 100 easyconfig PRs were merged since last conf call
      • bug fixes
        • add missing required PyPy dependency for Clair3, also copy preprocess and shared subdirectories, and enhance sanity check for provided libclair3 Python package (PR #18847)
        • fix source URL for segemehl 0.3.4 (PR #18878)
        • avoid use of hardcoded paths for Pillow by using --disable-platform-guessing option (PR #18881)
          • was motivated by problems with installing Pillow in EESSI, see also EESSI support issue #9
          • more problems remain because Pillow's setup.py want to locate zlib.h and libz.* in creative ways...
        • add patch to disable flaky DDRGES3 LAPACK test in OpenBLAS 0.3.23 + 0.3.24 (PR #18887)
        • add alternate checksum for NCCL v2.18.3 (PR #18906)
        • add missing dependencies for MONAI to support extras required by MONAI-Label (PR #18921)
      • enhancements
        • Paraview 5.11.1 fat build compatible with hardware rendering, software rendering, headless server mode, as well as interactive mode (PR #18631)
        • also build Python bindings for ITK 5.2.1 with foss/2022a (PR #18922)
      • (noteworthy) new software
        • ...
      • noteworthy software updates
        • ...
      • changes
      • EasyBuild 5.0 (to 5.0.x branch)
        • archive 2016a generation of easyconfigs (PR #18958)
        • archive easyconfigs using a pre-2016a compiler as toolchain (PR #18968)
        • archive 2016b generation of easyconfigs (PR #18976)
        • archive easyconfigs using compiler from 2016b generation (+ older GCC 4.x and 5.x) (PR #18978)
  • work-in-progress
    • docs (open PRs + issues)
    • framework (open PRs + issues)
      • reported bugs / bug fixes
        • add optimal optimization flags for Intel compilers on AMD CPUs (issue #3793)
          • for AMD Genoa, we don't want to use -mavx2 since then we won't get AVX-512 instructions
      • enhancements
        • ...
      • changes
        • ...
      • EasyBuild 5.0 (to 5.0.x branch)
        • TODO:
          • improve error reporting for failing shell commands (and EasyBuild crashes) (PR #4351)
          • should shell option for run_shell_cmd function be renamed to use_bash?
          • see also EasyBuild 5.0 sync meeting notes
    • easyblocks (open PRs + issues)
      • bug reports/fixes
        • fix extension filter for Perl packages (PR #2699)
        • fix --sanity-check-only and --module-only for UCX plugins (PR #3007)
          • nice example of how to make easyblocks compatible with --sanity-check-only and --module-only
        • enhance TensorFlow easyblock to avoid use of -mcpu=native for XNNPACK component when building on aarch64 (PR #3011)
      • enhancements
        • don't blindly overwrite -Dccflags + honour preconfigopts in Perl easyblock (PR #3010)
      • updates
      • new easyblocks
        • new custom easyblocks for Spparks and Stitch (PR #2948)
        • add generic CargoPythonBundle easyblock (PR #2964)
        • add new easyblock for HPCC and adapt HPL easyblock (PR #3009)
      • changes
        • Install only SuiteSparse libraries with make install (PR #3004)
      • EasyBuild 5.0 (to 5.0.x branch)
        • stop importing from easybuild.tools.py2vs3 (+ minor cleanup in init easyblocks test) (PR #3015)
    • easyconfigs (open PRs + issues)
      • bug fixes/reports
        • failing build of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1) (issue #18899)
        • MPI hanging if MPI_Init_thread is used with foss/2023a (issue #18925)
          • due to bug in libfabric, various workarounds possible (like setting $PSM3_DEVICES to self,shm)
        • make sure Python dependency included for ESPResSo is actually used by specifying -DPYTHON_EXECUTABLE (PR #18963)
          • YIL that -DPython3_EXECUTABLE, -DPython_EXECUTABLE, and -DPYTHON_EXECUTABLE are three very different options :man-facepalming:
          • CMakeMake should be settingv-DPython_EXECUTABLE & co (in EasyBuild 5.0)?
      • enhancements
        • also run easyconfigs test suite with Python 3.11 (PR #18009)
          • we will probably need to rename CVS to something else to dance around recent setuptools filtering out directories named CVS...
        • add patch to improve CUDA 11 compatibility of GCCcore/12.2.0 + GCCcore/12.3.0 (PR #18854)
      • new software
        • ...
      • noteworthy software updates
        • PETSc 3.19.4 w/ foss/2023a (PR #18608)
        • PyTorch v1.13.1 w/ foss/2022b + CUDA 11.7.0 (PR #18853)
        • PyTorch v1.13.1 w/ foss/2022b + CUDA 12.0.0 (PR #18806)
      • changes
        • ...
      • EasyBuild 5.0 (to 5.0.x branch)
        • scripts to archive easyconfigs (PR #18934)
        • remove old archived easyconfigs (EasyBuild 4.x archive) (PR #18982)

2023b update of common toolchains

  • 2023b toolchains should be included in EasyBuild 4.9.0 release
    • probably not yet next release (4.8.2, ETA end Oct'23)
    • candidate toolchains are merged, ready for more extensive testing of "big" apps
  • most significant change is jump to GCC 13.x
  • foss/2023.09 (PR #18886) - candidate for foss/2023b
    • GCC 13.2.0 + binutils 2.40
    • OpenMPI 4.1.6 (+ UCX 1.15.0, PMIx 4.2.6, libfabric 1.19.0)
    • FlexiBLAS 3.3.1 (+ OpenBLAS 0.3.24)
    • FFTW 3.3.10
    • ScaLAPACK 2.1.0
  • intel/2023.07 (PR #18439) - candidate for intel/2023b
    • GCC 13.2.0 + binutils 2.40
    • intel-compilers 2023.2.1
    • impi 2021.10.0
    • imkl 2023.2.1
  • testing
    • OSU-Micro-Benchmarks (already done?)
    • SciPy-bundle (numpy, scipy)
    • GROMACS (C++)
    • OpenFOAM (C++)
      • requires ParaView, Qt5, etc.
      • should we keep building on top of ParaView (only needed for paraFoam utility)?
        • installing paraFoam stand-alone is a PITA
        • paraFoam isn't actually used when running OpenFOAM simulations
    • CP2K (Fortran)
    • check if Qt6 can be used

Q&A

  • discussion about problem with VTK that Jörg was hitting
  • is there any testing being done on IPv6-only clusters?
    • internal network in HPC cluster at Imperial College London is IPv6
    • not currently, seems like a pretty exotic setup?
  • (Jörg) anyone working on PyTorch 2.x?
    • see Simon's closed PR #18269
    • maybe Flamefire is looking into it?
    • there's a PyTorch 2.1 release now
    • we're held back a bit by the relative aggressive testing here...
      • what would be a good minimal requirement?
      • RHEL8, A100, Intel+AMD CPUs
      • we should put a policy in place for this
  • (Jörg) fluidity which requires Zoltan (provided by Trilinos)
    • why is Fortran90 interface not enabled in Trilinos?
      • -DZoltan_ENABLE_F90INTERFACE=ON -DZoltan_ENABLE_ParMETIS=ON -DZoltan_ENABLE_Scotch=ON
        
      • can adjust "minimal" Trilinos that was added in PR #17448
  • (Jörg) question on Bazel
    • download problem when building Bazel, may be a red herring
    • there may be an actual other problem higher up
  • (Åke) how are people changes in Slurm 22.05.x w.r.t. srun and $SLURM_CPUS_PER_TASK?
    • srun is only listening to $SRUN_CPUS_PER_TASK (or use srun -c)
    • setting $SRUN_CPUS_PER_TASK causes trouble with mpirun (because it uses srun and affinity is then wrong)
    • using patch for mpirun that unsets $SRUN_CPUS_PER_TASK
    • similar patching was done at LUMI for a while, but it causes problems
    • CSCS may have more info on how they dealt with this (cfr. their bug report to Slurm: https://bugs.schedmd.com/show_bug.cgi?id=15632#c43)
Clone this wiki locally