Fix sccache for CTK 11.1 and properly track compilations in stats #2285

trxcllnt · 2024-11-08T23:36:49Z

This PR has some fixes I neglected to add to #2247.

nvcc in CUDA toolkit v11.1 didn't add the -D__CUDA_ARCH_LIST__= definition, so 5271494 expands the list of defines that indicate an nvcc host compiler invocation.
f160a0a and 1591089 report compilation type (local or dist) and duration for forced-no-cache, forced-recache, and compilation failures. It also counts and reports total compilations performed, not just compilations due to cache misses.
ccfc60b ensures compilations with --verbose are never dist-compiled, since the verbose output is parsed by tools like CMake and must reflect the local toolchain.
bdaf35e adds more clang flags so using clang as a CUDA compiler with -Xclang doesn't fail

Question for @sylvestre related to the last point -- do you know which bits of the clang toolchain (or CTK?) sccache should package when using clang as a device compiler? I am seeing errors like the following when attempting to dist-compile with ClangCUDA, but I'm not sure which files define the __nvvm_* symbols:

In file included from build/libcudacxx/test/internal_headers/headers/__barrier_async_contract_fulfillment.h.cu:1:
In file included from <built-in>:1:
In file included from /usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_runtime_wrapper.h:73:
/usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_builtin_vars.h:53:180: error: use of undeclared identifier '__nvvm_read_ptx_sreg_tid_x'
   53 |   __declspec(property(get = __fetch_builtin_x)) unsigned int x; static inline __attribute__((always_inline)) __attribute__((device)) unsigned int __fetch_builtin_x(void) { return __nvvm_read_ptx_sreg_tid_x(); };
...

…ersions

…led compilations

…since tools like CMake parse the output and expect to see client paths not dist-server paths

sylvestre · 2024-11-12T13:52:07Z

sorry, i don't know

…rsed by inputs/outputs even if the preprocessor, cicc, and ptxas commands are out of order.

trxcllnt · 2024-11-14T20:37:22Z

This doesn't seem to be an issue with sccache. It appears clang can't compile its own preprocessor output:

#!/usr/bin/env bash

# Basic CUDA example from https://godbolt.org/
cat <<EOF >/tmp/test.cu
__global__ void square(int* array, int n) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid < n)
        array[tid] = array[tid] * array[tid];
}
EOF

# Preprocess
clang++ -x cuda -E --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version /tmp/test.cu > /tmp/test.cui

# Compile (fails)
clang++ -x cuda-cpp-output --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version -o /tmp/test.cu.o /tmp/test.cui

trxcllnt · 2024-11-19T17:43:06Z

cc: @robertmaynard for review

trxcllnt added 5 commits November 8, 2024 10:48

check for more host-compiler nvcc defines to accommodate older nvcc v…

5271494

…ersions

ensure dist_type is reported for failed and uncached compilations

f160a0a

report total compilation count and compile times for uncached and fai…

1591089

…led compilations

compiler invocations with -v or --verbose must not be dist-compiled, …

ccfc60b

…since tools like CMake parse the output and expect to see client paths not dist-server paths

add more clang flags

bdaf35e

trxcllnt added 2 commits November 14, 2024 00:11

hash --gen_module_id_file and --module_id_file_name arguments

e93f775

Normalize nvcc subcommand order for CTK <12.0, ensuring the DAG is pa…

b1acd9d

…rsed by inputs/outputs even if the preprocessor, cicc, and ptxas commands are out of order.

trxcllnt force-pushed the fix/cuda11.1-and-stats branch from 30a1a7f to b1acd9d Compare November 14, 2024 01:27

always add --gen_module_id_file if --module_id_file_name is specified

2d058b5

robertmaynard approved these changes Nov 19, 2024

View reviewed changes

add --default-stream arg, fix parsing concatenated form of nvcc -t1

5469482

trxcllnt force-pushed the fix/cuda11.1-and-stats branch from 3b0ae6c to 5469482 Compare November 22, 2024 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sccache for CTK 11.1 and properly track compilations in stats #2285

Fix sccache for CTK 11.1 and properly track compilations in stats #2285

trxcllnt commented Nov 8, 2024

sylvestre commented Nov 12, 2024

trxcllnt commented Nov 14, 2024 •

edited

Loading

trxcllnt commented Nov 19, 2024

Fix sccache for CTK 11.1 and properly track compilations in stats #2285

Are you sure you want to change the base?

Fix sccache for CTK 11.1 and properly track compilations in stats #2285

Conversation

trxcllnt commented Nov 8, 2024

sylvestre commented Nov 12, 2024

trxcllnt commented Nov 14, 2024 • edited Loading

trxcllnt commented Nov 19, 2024

trxcllnt commented Nov 14, 2024 •

edited

Loading