Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: Sycl test failures on Ponte Vecchio #12295

Open
ndellingwood opened this issue Sep 20, 2023 · 13 comments
Open

Tpetra: Sycl test failures on Ponte Vecchio #12295

ndellingwood opened this issue Sep 20, 2023 · 13 comments
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@ndellingwood
Copy link
Contributor

ndellingwood commented Sep 20, 2023

Bug Report

@trilinos/tpetra

Description

I tested out a Sycl configuration on new Blake's Ponte Vecchio GPUs and with Daniel's PR #12294 updates, the following tests failed with seg faults:

The following tests FAILED:
  127 - TpetraCore_TpetraUtils_WrappedDualView (SEGFAULT)
  139 - TpetraCore_getEntryOnHost (SEGFAULT)
  157 - TpetraCore_BlockCrsPerfTest (SEGFAULT)

Steps to Reproduce

Use changes with #12294

Configuration (New) Blake PV queue:

# Interactive node
salloc -N 1 -p PV

# Environment
module load cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 git intel-oneapi-mkl/2023.1.0

# Configuration
cmake \
  -D CMAKE_CXX_COMPILER="/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++" \
  -D CMAKE_C_COMPILER="/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang" \
  -D CMAKE_Fortran_COMPILER="`which gfortran`" \
  -D CMAKE_CXX_FLAGS="-g -fp-model=precise" \
  -D CMAKE_C_FLAGS="-g" \
  -D BUILD_SHARED_LIBS=ON \
  -DTPL_ENABLE_MPI=OFF \
  -DTPL_ENABLE_BLAS:BOOL=ON \
   -DBLAS_LIBRARY_DIRS=$MKLROOT/lib/intel64 \
   -DBLAS_LIBRARY_NAMES=mkl_rt \
  -DTPL_ENABLE_LAPACK:BOOL=ON \
   -DLAPACK_LIBRARY_DIRS=$MKLROOT/lib/intel64 \
   -DLAPACK_LIBRARY_NAMES=mkl_rt \
  -DTPL_ENABLE_MKL:BOOL=ON \
   -DMKL_INCLUDE_DIRS=$MKLROOT/include \
   -DMKL_LIBRARY_DIRS=$MKLROOT/lib/intel64 \
   -DMKL_LIBRARY_NAMES=mkl_rt \
  -DTrilinos_ENABLE_ALL_PACKAGES=OFF \
  -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES=OFF \
  -DTrilinos_ENABLE_TESTS=ON \
  -DTrilinos_MUST_FIND_ALL_TPL_LIBS=TRUE \
  -DTrilinos_ENABLE_OpenMP=OFF \
  -DTrilinos_ENABLE_Kokkos=ON \
  -D Kokkos_ENABLE_SYCL=ON \
   -D Kokkos_ENABLE_TESTS=OFF \
   -D Kokkos_ENABLE_ONEDPL=OFF \
  -D Kokkos_ARCH_INTEL_PVC=ON \
  -DTrilinos_ENABLE_KokkosKernels=ON \
   -D KokkosKernels_ENABLE_TESTS=OFF \
  -DTrilinos_ENABLE_Tpetra=ON \
  -D Tpetra_INST_SYCL=ON \
  -D Tpetra_INST_SERIAL=ON \
   -D Tpetra_ENABLE_TESTS=ON \
\
  -DTPL_ENABLE_Matio=OFF \
\
$TRILINOS_DIR
@ndellingwood ndellingwood added type: bug The primary issue is a bug in Trilinos code or tests pkg: Tpetra labels Sep 20, 2023
@csiefer2
Copy link
Member

csiefer2 commented Sep 20, 2023

@ndellingwood Blake compilers won't build squat. Both 2023.1 and 2023.2 are missing ocloc and LevelZero. @fryeguy52 is fixing the compilers. Will try to reproduce once he's done.

@ndellingwood
Copy link
Contributor Author

Thanks for the info @csiefer2 , sorry for any added noise with this issue

@masterleinad
Copy link
Contributor

@ndellingwood Feel free to add me to SYCL issues in Trilinos.

@ndellingwood
Copy link
Contributor Author

@masterleinad sure thing. I'm putting in a printf fix shortly (just in case you're standing up a build and run into it).
Regarding this issue, I should also point out a mistaken assumption in my configuration, I assumed Tpetra would enable SYCL based on Kokkos_ENABLE_SYCL=ON, but looking at the configure output I needed to enable SYCL for Tpetra explicitly. Rebuilding for a retest

@ndellingwood
Copy link
Contributor Author

With local changes in PR #12471 and setting Tpetra_INST_SYCL=ON, this is the set of test failures:

The following tests FAILED:
	 19 - TpetraCore_BlockCrsMatrix (Failed)
	 82 - TpetraCore_ImportExport2_UnitTests_Send (Failed)
	 83 - TpetraCore_ImportExport2_UnitTests_ISend (Failed)
	 84 - TpetraCore_ImportExport2_UnitTests_Alltoall (Failed)
	140 - TpetraCore_getEntryOnHost (Failed)
Errors while running CTest

@masterleinad
Copy link
Contributor

masterleinad commented Nov 2, 2023

All of these tests are passing for me on the Intel testbeds.

@ndellingwood
Copy link
Contributor Author

@masterleinad which version of intel/oneapi and which architecture did you test?

@masterleinad
Copy link
Contributor

@masterleinad which version of intel/oneapi and which architecture did you test?

oneapi/eng-compiler/2023.10.15.002 with Kokkos_ENABLE_SERIAL=ON, Kokkos_ENABLE_SYCL=ON and Kokkos_ARCH_INTEL_PVC=ON.

@masterleinad
Copy link
Contributor

That compiler is tagged as 2024.0.0.

@ndellingwood
Copy link
Contributor Author

@masterleinad did you add Tpetra_INST_SYCL=ON explicitly? If not, can you look over the configure output to confirm that SYCL was enabled for Tpetra?

For reference, I initially had not set that and had this warning in the configure output:

-- NOTE: Kokkos::SYCL is ON (the CMake option Kokkos_ENABLE_SYCL is ON), but the corresponding Tpetra Node type is disabled.  If you want to enable instantiation and use of Kokkos::SYCL in Tpetra, please also set the CMake option Tpetra_INST_SYCL:BOOL=ON.  If you use the Kokkos::SYCL version of Tpetra without doing this, you will get link errors!
-- Determine whether Tpetra will assume that MPI is GPU aware:
--   - Tpetra_INST_CUDA, Tpetra_INST_HIP and Tpetra_INST_SYCL atre OFF, so Tpetra will assume that MPI is not GPU aware.
-- Tpetra execution space availability (ON means available): 
--   - Serial:  ON 
--   - Threads: OFF
--   - OpenMP:  OFF
--   - Cuda:    OFF
--   - HIP:     OFF
--   - SYCL:    OFF

@masterleinad
Copy link
Contributor

@masterleinad did you add Tpetra_INST_SYCL=ON explicitly? If not, can you look over the configure output to confirm that SYCL was enabled for Tpetra?

Yes, it was set and I am seeing

[...]
-- Tpetra: Using internal Kokkos
-- Tpetra: Enabling deprecated code
-- Determine whether Tpetra will assume that MPI is GPU aware:
--   - TPL_ENABLE_MPI is OFF, so we assume that (nonexistent) MPI is not GPU aware.
-- Tpetra execution space availability (ON means available): 
--   - Serial:  ON
--   - Threads: OFF
--   - OpenMP:  OFF
--   - Cuda:    OFF
--   - HIP:     OFF
--   - SYCL:    ON
-- Tpetra: Tpetra_INST_INT_LONG_LONG is enabled by default.
-- Tpetra: Tpetra_INST_INT_UNSIGNED is disabled by default.
-- Tpetra: Tpetra_INST_INT_UNSIGNED_LONG is disabled by default.
-- Tpetra: Tpetra_INST_INT_INT is disabled by default.
-- Tpetra: Tpetra_INST_INT_LONG is disabled by default.
-- 
-- Tpetra: Validate global ordinal setting ...
-- Tpetra: global ordinal setting is OK
[...]

@ndellingwood
Copy link
Contributor Author

@masterleinad thanks! Can you post your configuration as well? I'd like to compare to see if I have misconfigured, but happened to get a complete build

@masterleinad
Copy link
Contributor

masterleinad commented Nov 9, 2023

I tried again with the configuration posted in the pull request description (#12295 (comment)) and see

TpetraCore_TpetraUtils_WrappedDualView (Failed)
TpetraCore_getEntryOnHost (Failed)

with MKL and see

TpetraCore_CrsMatrix_2DRandomDist

timing out. Previously (#12295 (comment)) when I saw all tests passing, I was also pulling in Kokkos develop.

@jhux2 jhux2 added this to Tpetra Aug 12, 2024
@jhux2 jhux2 moved this to Backlog in Tpetra Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
Status: Backlog
Development

No branches or pull requests

3 participants