Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: compilation failure at comm_nccl.cu #959

Open
tylerjereddy opened this issue Oct 11, 2024 · 11 comments
Open

BLD: compilation failure at comm_nccl.cu #959

tylerjereddy opened this issue Oct 11, 2024 · 11 comments

Comments

@tylerjereddy
Copy link

On the LANL Venado machine, Linux ARM/Grace-Hopper architecture, whether using clang 18 (Cray clang version 18.0.0) or gcc-13 (13.2.1) compiler toolchain (both with nvcc from CUDA 12.5), the same compilation error arises for a recently-provided legate release (we only received a tarball--and the only version info I can find is CMakeLists.txt:set(legate_version 24.09.00), but this may be a dev version of that and not a tagged release yet). If you direct me to the appropriate location to grep out an embedded git hash I'll go ahead and do that for you, but I don't have a git bundle, just a preview release tarball as far as I can tell.

Here are the steps I follow on Venado:

Set up of environment and compilation commands
cd /lustre/vescratch1/treddy/custom_nvidia/legate
rm -rf arch-linux-cuda-release
eval "$(/lustre/vescratch1/treddy/tyler_conda/conda_scratch/bin/conda shell.bash hook)"
conda activate legate_custom
set +o errexit
set +e 
module load PrgEnv-gnu/8.5.0
export CC=gcc-13
export CXX=g++-13
export CPATH=/opt/cray/libfabric/1.20.1/include:$CPATH
export LIBRARY_PATH=/opt/cray/libfabric/1.20.1/lib64:$LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/cray/libfabric/1.20.1/lib64:$LD_LIBRARY_PATH
module load cudatoolkit/24.7_12.5 
module load cray-hdf5-parallel/1.14.3.1
export LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.30/ofi/crayclang/17.0/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=/opt/cray/pe/mpich/8.1.30/ofi/crayclang/17.0/lib:$LIBRARY_PATH
export PATH=$PATH:/opt/cray/pe/cce/18.0.0/bin
export PATH=/opt/cray/libfabric/1.20.1/bin:$PATH
./configure --with-cuda --with-hdf5 --with-gasnet
export LEGATE_ARCH='arch-linux-cuda-release'
export LEGATE_DIR='/lustre/vescratch1/treddy/custom_nvidia/legate'
make -j 64

And here is the compilation failure (snipped at the end because the C++ compilation spam is after the error is a bit much):

[212/308] Building CXX object _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_analysis.cc.o
In file included from /usr/include/c++/13/bits/specfun.h:43,
                 from /usr/include/c++/13/cmath:3699,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:16:
In static member function ‘static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp*, _Tp*, _Up*) [with _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Up = Legion::Internal::CopyFillAggregator::CopyUpdate*; bool _IsMove = false]’,
    inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = Legion::Internal::CopyFillAggregator::CopyUpdate**; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:506:30,
    inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = Legion::Internal::CopyFillAggregator::CopyUpdate**; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:533:42,
    inlined from ‘_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:540:31,
    inlined from ‘_OI std::copy(_II, _II, _OI) [with _II = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:633:7,
    inlined from ‘static _ForwardIterator std::__uninitialized_copy<true>::__uninit_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _ForwardIterator = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_uninitialized.h:147:27,
    inlined from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _ForwardIterator = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_uninitialized.h:185:15,
    inlined from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, allocator<_Tp>&) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _ForwardIterator = Legion::Internal::CopyFillAggregator::CopyUpdate**; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*]’ at /usr/include/c++/13/bits/stl_uninitialized.h:373:37,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_insert(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/vector.tcc:814:38,
    inlined from ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::insert(const_iterator, _InputIterator, _InputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; <template-parameter-2-2> = void; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/stl_vector.h:1483:19,
    inlined from ‘void Legion::Internal::CopyFillAggregator::issue_copies(Legion::Internal::InstanceView*, std::map<Legion::Internal::InstanceView*, std::vector<CopyUpdate*> >&, std::set<Legion::Internal::RtEvent>&, Legion::Internal::ApEvent, const Legion::Internal::FieldMask&, const Legion::Internal::PhysicalTraceInfo&, bool, bool, std::vector<Legion::Internal::ApEvent>*)’ at /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:7339:28:
/usr/include/c++/13/bits/stl_algobase.h:437:30: warning: ‘void* __builtin_memmove(void*, const void*, long unsigned int)’ writing between 9 and 9223372036854775800 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
  437 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/13/aarch64-suse-linux/bits/c++allocator.h:33,
                 from /usr/include/c++/13/bits/allocator.h:46,
                 from /usr/include/c++/13/bits/stl_tree.h:64,
                 from /usr/include/c++/13/map:62,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_types.h:30,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion.h:56,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:17:
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = Legion::Internal::InstanceView*]’,
    inlined from ‘static _Tp* std::allocator_traits<std::allocator<_Tp1> >::allocate(allocator_type&, size_type) [with _Tp = Legion::Internal::InstanceView*]’ at /usr/include/c++/13/bits/alloc_traits.h:482:28,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = Legion::Internal::InstanceView*; _Alloc = std::allocator<Legion::Internal::InstanceView*>]’ at /usr/include/c++/13/bits/stl_vector.h:378:33,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/stl_vector.h:375:7,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_insert(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/vector.tcc:805:40,
    inlined from ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::insert(const_iterator, _InputIterator, _InputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; <template-parameter-2-2> = void; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/stl_vector.h:1483:19,
    inlined from ‘void Legion::Internal::CopyFillAggregator::issue_copies(Legion::Internal::InstanceView*, std::map<Legion::Internal::InstanceView*, std::vector<CopyUpdate*> >&, std::set<Legion::Internal::RtEvent>&, Legion::Internal::ApEvent, const Legion::Internal::FieldMask&, const Legion::Internal::PhysicalTraceInfo&, bool, bool, std::vector<Legion::Internal::ApEvent>*)’ at /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:7339:28:
/usr/include/c++/13/bits/new_allocator.h:151:55: note: at offset [-9223372036854775808, -1] into destination object of size [8, 9223372036854775800] allocated by ‘operator new’
  151 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^
[296/308] Building CUDA object src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o
FAILED: src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o 
/opt/nvidia/hpc_sdk/Linux_aarch64/24.7/cuda/12.5/bin/nvcc -forward-unknown-to-host-compiler -DFMT_SHARED -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_CUDA -DUSE_HDF -Dlegate_EXPORTS -I/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp -I/lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/src/cpp/include/legate -I/lustre/vescratch1/treddy/custom_nvidia/legate/share/legate/mpi_wrapper/src -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/cccl-src/thrust/thrust/cmake/../.. -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/cccl-src/cub/cub/cmake/../.. -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/mappers -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-build/runtime -isystem /opt/nvidia/hpc_sdk/Linux_aarch64/24.7/cuda/12.5/targets/sbsa-linux/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/mdspan-src/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/span-src/include -isystem /lustre/vescratch1/treddy/tyler_conda/conda_scratch/envs/legate_custom/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/fmt-src/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/argparse-src/include --compiler-options=-O3 -O2 -std=c++17 -arch=all-major -Xcompiler=-fPIC -Xfatbin=-compress-all --expt-extended-lambda --expt-relaxed-constexpr -Wno-deprecated-gpu-targets -MD -MT src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o -MF src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o.d -x cu -c /lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/comm/detail/comm_nccl.cu -o src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/variant_helper.h: In instantiation of ‘static void legate::detail::VariantHelper<T, SELECTOR, true>::record(const legate::Library&, legate::TaskInfo*, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId; SELECTOR = legate::detail::GPUVariant]’:
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/task.inl:55:64:   required from ‘static std::unique_ptr<legate::TaskInfo> legate::LegateTask<T>::create_task_info_(const legate::Library&, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId]’
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/task.inl:44:37:   required from ‘static void legate::LegateTask<T>::register_variants(legate::Library, legate::LocalTaskID, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId]’
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/task.inl:37:18:   required from ‘static void legate::LegateTask<T>::register_variants(legate::Library, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId]’
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/comm/detail/comm_nccl.cu:277:56:   required from here
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/variant_helper.h:133:16: error: unable to deduce ‘const auto’ from ‘task_wrapper_<std::invoke_result_t<ncclUniqueId (* const)(const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*), const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*>, variant_impl, variant_kind>’
       constexpr auto entry = T::BASE::template task_wrapper_<RET, variant_impl, variant_kind>;
@lightsighter
Copy link

You can ignore the warning for the legion_analysis.cc translation unit. It is a bug with the -Wstringop-overflow static analysis which is present in many compilers. You can read more about it here.

The real problem is this:

/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/variant_helper.h:133:16: error: unable to deduce ‘const auto’ from ‘task_wrapper_<std::invoke_result_t<ncclUniqueId (* const)(const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*), const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*>, variant_impl, variant_kind>’
       constexpr auto entry = T::BASE::template task_wrapper_<RET, variant_impl, variant_kind>;

@manopapad
Copy link
Contributor

@tylerjereddy could you please try replacing constexpr auto entry with constexpr Processor::TaskFuncPtr entry?

@tylerjereddy
Copy link
Author

Will do, Venado is down for another day or two I think (this time for a dedicated activity time/reservation I think).

@marcinz
Copy link
Collaborator

marcinz commented Oct 17, 2024

@tylerjereddy Does the compiler provide any notes after the error?

@tylerjereddy
Copy link
Author

A few thousand lines of C++ spam follow the error IIRC (sorry C++ devs..), but I can share the full log once Venado comes back up if you want.

@tylerjereddy
Copy link
Author

@marcinz I was able to access a Venado frontend this morning, reproduce the problem, and then place the full compile error output in a repo at: https://github.com/tylerjereddy/error_messages/blob/main/compile_failures/legate_issue_959_oct_28_2024.txt

tylerjereddy added a commit to tylerjereddy/error_messages that referenced this issue Oct 28, 2024
* Add the compile error output after applying the
legate patch from:
nv-legate/legate#959 (comment)
@tylerjereddy
Copy link
Author

@manopapad I applied your patch--the compilation still failed at roughly the same spot, but the reason for the failure did change. I've placed the full error output in a git repo at: https://github.com/tylerjereddy/error_messages/blob/main/compile_failures/legate_issue_959_oct_28_2024_after_patch.txt

@manopapad
Copy link
Contributor

@tylerjereddy maybe this patch will make the compiler happy?

diff --git a/src/cpp/legate/task/variant_helper.h b/src/cpp/legate/task/variant_helper.h
index f07a6068..ccca8e35 100644
--- a/src/cpp/legate/task/variant_helper.h
+++ b/src/cpp/legate/task/variant_helper.h
@@ -120,17 +120,21 @@ class VariantHelper<T, SELECTOR, true> {
     constexpr auto* options     = SELECTOR<T>::options;

     if constexpr (std::is_convertible_v<decltype(variant_impl), VariantImpl>) {
-      constexpr auto entry = T::BASE::template task_wrapper_<variant_impl, variant_kind>;
+      constexpr void (*entry)(
+        const void*, std::size_t, const void*, std::size_t, Legion::Processor) =
+        T::BASE::template task_wrapper_<variant_impl, variant_kind>;

       task_info->add_variant_(
         TaskInfo::AddVariantKey{}, lib, variant_kind, variant_impl, entry, options, all_options);
     } else {
-      using RET            = std::invoke_result_t<decltype(variant_impl),
-                                                  const Legion::Task*,
-                                                  const std::vector<Legion::PhysicalRegion>&,
-                                                  Legion::Context,
-                                                  Legion::Runtime*>;
-      constexpr auto entry = T::BASE::template task_wrapper_<RET, variant_impl, variant_kind>;
+      using RET = std::invoke_result_t<decltype(variant_impl),
+                                       const Legion::Task*,
+                                       const std::vector<Legion::PhysicalRegion>&,
+                                       Legion::Context,
+                                       Legion::Runtime*>;
+      constexpr void (*entry)(
+        const void*, std::size_t, const void*, std::size_t, Legion::Processor) =
+        T::BASE::template task_wrapper_<RET, variant_impl, variant_kind>;

       task_info->add_variant_(
         TaskInfo::AddVariantKey{}, lib, variant_kind, variant_impl, entry, options, all_options);

tylerjereddy added a commit to tylerjereddy/error_messages that referenced this issue Oct 29, 2024
* Add compile failure after applying latest patch
from:
nv-legate/legate#959 (comment)
@tylerjereddy
Copy link
Author

@tylerjereddy
Copy link
Author

I tried switching to GNU compiler toolchain 12.3.0 with and without the latest patch (instead of 13.2.1), and the compile errors were the same as those previously reported.

Likewise for using an older CUDA toolchain--same compiler errors with and without the latest patch when using /opt/nvidia/hpc_sdk/Linux_aarch64/24.7/cuda/11.8/bin/nvcc instead of /opt/nvidia/hpc_sdk/Linux_aarch64/24.7/cuda/12.5/bin/nvcc.

Maybe I'll check if my old legate build still works on Venado to check other things for now.

@marcinz
Copy link
Collaborator

marcinz commented Nov 1, 2024

@tylerjereddy This could not make any difference, but I wonder if we could try to simplify the build. I am thinking:

module load conda
mamba create -n legate-build python cmake elfutils
mamba activate legate-build
module load PrgEnv-gnu 
module load cray-hdf5-parallel nccl
./configure --with-cuda --with-hdf5 --with-gasnet
export LEGATE_ARCH='arch-linux-cuda-release'
export LEGATE_DIR='/lustre/vescratch1/treddy/custom_nvidia/legate'
make -j 64

The main idea is to modify as little as possible in the Cray env. The above worked for me on Perlmutter, but I started with some modules preloaded. Just in case, here is the module environment:

module list
> module list

Currently Loaded Modules:
  1) craype-x86-milan   3) craype-network-ofi                      5) PrgEnv-gnu/8.5.0   7) cray-libsci/23.12.5   9) craype/2.7.30    11) perftools-base/23.12.0  13) cudatoolkit/12.2       15) gpu/1.0        17) conda/Miniconda3-py311_23.11.0-2  19) sqs/2.0
  2) libfabric/1.20.1   4) xpmem/2.6.2-2.5_2.40__gd067c3f.shasta   6) cray-dsmml/0.2.2   8) cray-mpich/8.1.28    10) gcc-native/12.3  12) cpe/23.12               14) craype-accel-nvidia80  16) darshan/3.4.4  18) cray-hdf5-parallel/1.12.2.9       20) nccl/2.21.5 (g)

  Where:
   g:  built for GPU

and here is the resulting legate-build conda env:

mamba list
> mamba list
# packages in environment at /global/homes/m/mzalewsk/.conda/envs/legate-build:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.34.2               heb4867d_0    conda-forge
ca-certificates           2024.8.30            hbcca054_0    conda-forge
cmake                     3.30.5               hf9cb763_0    conda-forge
elfutils                  0.192                h1fa0c75_0    conda-forge
gettext                   0.22.5               he02047a_3    conda-forge
gettext-tools             0.22.5               he02047a_3    conda-forge
gnutls                    3.8.7                h32866dd_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
ld_impl_linux-64          2.43                 h712a8e2_2    conda-forge
libarchive                3.7.4                hfca40fe_0    conda-forge
libasprintf               0.22.5               he8f35ee_3    conda-forge
libasprintf-devel         0.22.5               he8f35ee_3    conda-forge
libcurl                   8.10.1               hbbe4b11_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.3                h5888daf_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgettextpo              0.22.5               he02047a_3    conda-forge
libgettextpo-devel        0.22.5               he02047a_3    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libidn2                   2.3.7                hd590300_0    conda-forge
libmicrohttpd             1.0.1                hbc5bc17_1    conda-forge
libmpdec                  4.0.0                h4bc722e_0    conda-forge
libnghttp2                1.64.0               h161d5f1_0    conda-forge
libsqlite                 3.47.0               hadc24fc_1    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libtasn1                  4.19.0               h166bdaf_0    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.49.2               hb9d3cd8_0    conda-forge
libxml2                   2.13.4               h064dc61_2    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              hd590300_1001    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
nettle                    3.9.1                h7ab15ed_0    conda-forge
openssl                   3.3.2                hb9d3cd8_0    conda-forge
p11-kit                   0.24.1               hc5aa10d_0    conda-forge
pip                       24.3.1             pyh145f28c_0    conda-forge
python                    3.13.0          h9ebbce0_100_cp313    conda-forge
python_abi                3.13                    5_cp313    conda-forge
readline                  8.2                  h8228510_1    conda-forge
rhash                     1.4.5                hb9d3cd8_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

It would be nice to know if using the compiler wrappers with minimal changes to the environment makes any difference.

Here is the resulting configuration:

configuration
> ./configure --with-cuda --with-hdf5 --with-gasnet                                                                                                                                                                                                                                                                             
======================================================================================================================================================================================================================================================================================================================================================================================  
                                                                                                                                                                  Configuring Legate.Core to compile on your system                                                                                                                                                                     
======================================================================================================================================================================================================================================================================================================================================================================================  
RUNNING: config.aedifix.cmake.cmaker.CMaker.finalize() (aedifix/cmake/cmaker.py:211)                                                                                                                                                                                                                                                                                                    
======================================================================================================================================================================================================================================================================================================================================================================================  
                                                                                                                                                                               Configuring Legate.Core                                                                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
                                                                                                                                                                             This may take a few minutes                                                                                                                                                                                
======================================================================================================================================================================================================================================================================================================================================================================================  
RUNNING: config.legate_core_internal.main_package.LegateCore.summarize() (legate_core_internal/main_package.py:476)                                                                                                                                                                                                                                                                     
======================================================================================================================================================================================================================================================================================================================================================================================  
Core Project:                                                                                                                                                                                                                                                                                                                                                                           
  Legate.Core Dir:   /global/u2/m/mzalewsk/repos/legate.core.internal                                                                                                                                                                                                                                                                                                                   
  Legate.Core Arch:  arch-linux-cuda-release                                                                                                                                                                                                                                                                                                                                            
  Build Generator:   /usr/bin/ninja                                                                                                                                                                                                                                                                                                                                                     
  Build type:        Release                                                                                                                                                                                                                                                                                                                                                            
  Num Build Threads: 255                                                                                                                                                                                                                                                                                                                                                                
  Install prefix:    /usr/local                                                                                                                                                                                                                                                                                                                                                         
C Compiler:                                                                                                                                                                                                                                                                                                                                                                             
  Executable:     /opt/cray/pe/craype/2.7.30/bin/cc                                                                                                                                                                                                                                                                                                                                     
  Global C Flags: -O3                                                                                                                                                                                                                                                                                                                                                                   
C++ Compiler:                                                                                                                                                                                                                                                                                                                                                                           
  Executable:       /opt/cray/pe/craype/2.7.30/bin/CC                                                                                                                                                                                                                                                                                                                                   
  Global C++ Flags: -O3                                                                                                                                                                                                                                                                                                                                                                 
Legate.Core:                                                                                                                                                                                                                                                                                                                                                                            
  C++ Flags:       -Wall -Wextra -Werror -Walloca -Wdeprecated -Wimplicit-fallthrough -fdiagnostics-show-template-tree -Wignored-qualifiers -Wmissing-field-initializers -Wshadow -pedantic -Wsign-compare -Wshadow -O3 -fstack-protector-strong                                                                                                                                        
  Linker Flags:                                                                                                                                                                                                                                                                                                                                                                         
  CUDA Flags:                                                                                                                                                                                                                                                                                                                                                                           
  With Tests:      False                                                                                                                                                                                                                                                                                                                                                                
  With Docs:       False                                                                                                                                                                                                                                                                                                                                                                
  With Examples:   False                                                                                                                                                                                                                                                                                                                                                                
  Python bindings: False                                                                                                                                                                                                                                                                                                                                                                
CMake:                                                                                                                                                                                                                                                                                                                                                                                  
  Executable: /global/u2/m/mzalewsk/.conda/envs/legate-build/bin/cmake                                                                                                                                                                                                                                                                                                                  
  Version:    3.30.5                                                                                                                                                                                                                                                                                                                                                                    
  Generator:  Ninja                                                                                                                                                                                                                                                                                                                                                                     
CUDA:                                                                                                                                                                                                                                                                                                                                                                                   
  Architectures: all-major                                                                                                                                                                                                                                                                                                                                                              
  Executable:    /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc                                                                                                                                                                                                                                                                                                               
  Flags:         --compiler-options=-O3                                                                                                                                                                                                                                                                                                                                                 
Python:                                                                                                                                                                                                                                                                                                                                                                                 
  Executable: /global/homes/m/mzalewsk/.conda/envs/legate-build/bin/python3                                                                                                                                                                                                                                                                                                             
  Version:    3.13.0 | packaged by conda-forge | (main, Oct  8 2024, 20:04:32) [GCC 13.3.0]                                                                                                                                                                                                                                                                                             
Legion:                                                                                                                                                                                                                                                                                                                                                                                 
  Downloaded:          False                                                                                                                                                                                                                                                                                                                                                            
    Root dir:          /global/homes/m/mzalewsk/repos/legate.core.internal/arch-linux-cuda-release/cmake_build/_deps/legion-src                                                                                                                                                                                                                                                         
  With CUDA:           ON                                                                                                                                                                                                                                                                                                                                                               
  CUDA arch:           ['all-major']                                                                                                                                                                                                                                                                                                                                                    
  Networks:            gasnetex                                                                                                                                                                                                                                                                                                                                                         
  Bounds checks:       False                                                                                                                                                                                                                                                                                                                                                            
  Max dim:             4                                                                                                                                                                                                                                                                                                                                                                
  Max fields:          256                                                                                                                                                                                                                                                                                                                                                              
  Build Spy:           False                                                                                                                                                                                                                                                                                                                                                            
  Build Rust profiler: False                                                                                                                                                                                                                                                                                                                                                            
======================================================================================================================================================================================================================================================================================================================================================================================  
                                                                                                                                                                                Configuration Complete                                                                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
  Please set the following:                                                                                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                                                                                        
  export LEGATE_CORE_ARCH='arch-linux-cuda-release'                                                                                                                                                                                                                                                                                                                                     
  export LEGATE_CORE_DIR='/global/u2/m/mzalewsk/repos/legate.core.internal'                                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                                                                                        
  Then build libraries:                                                                                                                                                                                                                                                                                                                                                                 
  $ make                                                                           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants