Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sccache for CTK 11.1 and properly track compilations in stats #2285

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Nov 8, 2024

This PR has some fixes I neglected to add to #2247.

  1. nvcc in CUDA toolkit v11.1 didn't add the -D__CUDA_ARCH_LIST__= definition, so 5271494 expands the list of defines that indicate an nvcc host compiler invocation.
  2. f160a0a and 1591089 report compilation type (local or dist) and duration for forced-no-cache, forced-recache, and compilation failures. It also counts and reports total compilations performed, not just compilations due to cache misses.
  3. ccfc60b ensures compilations with --verbose are never dist-compiled, since the verbose output is parsed by tools like CMake and must reflect the local toolchain.
  4. bdaf35e adds more clang flags so using clang as a CUDA compiler with -Xclang doesn't fail

Question for @sylvestre related to the last point -- do you know which bits of the clang toolchain (or CTK?) sccache should package when using clang as a device compiler? I am seeing errors like the following when attempting to dist-compile with ClangCUDA, but I'm not sure which files define the __nvvm_* symbols:

In file included from build/libcudacxx/test/internal_headers/headers/__barrier_async_contract_fulfillment.h.cu:1:
In file included from <built-in>:1:
In file included from /usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_runtime_wrapper.h:73:
/usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_builtin_vars.h:53:180: error: use of undeclared identifier '__nvvm_read_ptx_sreg_tid_x'
   53 |   __declspec(property(get = __fetch_builtin_x)) unsigned int x; static inline __attribute__((always_inline)) __attribute__((device)) unsigned int __fetch_builtin_x(void) { return __nvvm_read_ptx_sreg_tid_x(); };
...

@sylvestre
Copy link
Collaborator

sorry, i don't know

@trxcllnt
Copy link
Contributor Author

trxcllnt commented Nov 14, 2024

This doesn't seem to be an issue with sccache. It appears clang can't compile its own preprocessor output:

#!/usr/bin/env bash

# Basic CUDA example from https://godbolt.org/
cat <<EOF >/tmp/test.cu
__global__ void square(int* array, int n) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid < n)
        array[tid] = array[tid] * array[tid];
}
EOF

# Preprocess
clang++ -x cuda -E --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version /tmp/test.cu > /tmp/test.cui

# Compile (fails)
clang++ -x cuda-cpp-output --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version -o /tmp/test.cu.o /tmp/test.cui

@trxcllnt
Copy link
Contributor Author

cc: @robertmaynard for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants