CUDA not available or Failed to Launch Kernels (error code invalid argument) #283

kings177 · 2024-05-17T17:19:07Z

References: HigherOrderCO/Bend#320 by rubenjr0 with contents:

"
Hello! I think I've encountered a bug. When running this example from the readme:

def sum(depth, x):
  switch depth:
    case 0:
      return x
    case _:
      fst = sum(depth-1, x*2+0) # adds the fst half
      snd = sum(depth-1, x*2+1) # adds the snd half
      return fst + snd
    
def main:
  return sum(30, 0)

The output is 0. I've tried bend run, bend run-c, and bend gen-cu (bend run-cu says cuda is not available, so I manually compile it with nvcc).

The output on my machine when running sum(24, 0) is 8388608, but on equivalent Haskell and Python programs the programs return 140737479966720. The results start to diverge when depth>=13.

I was wondering what could be causing these issues, both the incorrect result when depth>=13, and the result=0 when depth>=25.

My computer specs:

OS: Pop_OS 22.04
CPU: AMD Ryzen 5 2600x (12 cores)
GPU: NVIDIA RTX 4060 ti (16GB)
"

The text was updated successfully, but these errors were encountered:

jzinno · 2024-05-17T18:16:36Z

Hey saw on HigherOrderCO/Bend/issues/320 that you've run into the Failed to launch kernels (error code invalid argument)! issue. Adding some more info here:

Tested on Ryzed 3600XT & 2070 super

This appears to be a related to the following in hvm.cu

// Local Net
const u32 L_NODE_LEN = 0x2000;
const u32 L_VARS_LEN = 0x2000;
struct LNet {
  Pair node_buf[L_NODE_LEN];
  Port vars_buf[L_VARS_LEN];
};

It was suggested this might be related to shared mem, added a test to debug:

  // Check max shared memory size
  int maxSharedMem;
  cudaDeviceGetAttribute(&maxSharedMem, cudaDevAttrMaxSharedMemoryPerBlock, 0);
  printf("Max shared memory per block: %d bytes\n", maxSharedMem);

  // Configures Shared Memory Size
  if (sizeof(LNet) <= maxSharedMem)
  {
    cudaFuncSetAttribute(evaluator, cudaFuncAttributeMaxDynamicSharedMemorySize, sizeof(LNet));
  }
  else
  {
    fprintf(stderr, "Error: LNet size (%zu bytes) exceeds max shared memory per block (%d bytes)\n", sizeof(LNet), maxSharedMem);
    exit(EXIT_FAILURE);
  }

  // Configures Shared Memory Size
  // cudaFuncSetAttribute(evaluator, cudaFuncAttributeMaxDynamicSharedMemorySize, sizeof(LNet));

╰─ ./a.out
Max shared memory per block: 49152 bytes
Error: LNet size (98304 bytes) exceeds max shared memory per block (49152 bytes)

changed LNet def to:

// Local Net
const u32 L_NODE_LEN = 0x1000;
const u32 L_VARS_LEN = 0x1000;
struct LNet {
  Pair node_buf[L_NODE_LEN];
  Port vars_buf[L_VARS_LEN];
};

recompiled and running on GPU work well!

╰─ hyperfine './a.out' #cuda
Benchmark 1: ./a.out
  Time (mean ± σ):     914.8 ms ±  36.1 ms    [User: 429.0 ms, System: 57.4 ms]
  Range (min … max):   886.7 ms … 985.7 ms    10 runs

╰─ ./a.out
Result: 8388608
- ITRS: 401921647
- LEAK: 19502207
- TIME: 0.50s
- MIPS: 809.96

╰─ hyperfine './b.out' #C
Benchmark 1: ./b.out
  Time (mean ± σ):     18.806 s ± 13.777 s    [User: 67.170 s, System: 82.483 s]
  Range (min … max):    7.533 s … 41.807 s    10 runs

╰─ ./b.out
Result: 8388608
- ITRS: 402653167
- TIME: 7.74s
- MIPS: 52.03

Perhaps cudaDevAttributeMemoryPerBlock could be inspected to automatically appropriately set this on more GPU archs.

kings177 · 2024-05-17T18:29:05Z

oh nice, thank you, we will be working on making the HVM adaptable to multiples GPU, the current iteration was only developed taking the 4090 into account.

kings177 · 2024-05-17T18:29:43Z

i'll Leave this open as a reminder

Epicguru · 2024-05-17T22:06:17Z

Hi, I've seen that HigherOrderCO/Bend#342 has been marked as a duplicate of this so I'll comment here. I'm using WSL on Windows 11, CUDA toolkit installed and verified working with other WSL and Docker programs, but getting the same CUDA not available! message.

My GPU is a RTX 4090.

I don't necessarily see how this issue is a duplicate of the other, given that this issue was originally closed with the explaination that it was only developed with a 4090 in mind, but that is the same GPU I'm running (unless it's vendor specific somehow?).

In any case, an exciting project. Keep up the good work!

MrCreeps · 2024-05-17T22:18:04Z

Might as well add that I also am having this issue present in HigherOrderCO/Bend#342. I'm using WSL (Ubuntu 22.04.03 LTS) on Windows 11 23H2.

Output from nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Output from nvidia-smi

Fri May 17 17:13:17 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 551.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   50C    P8             49W /  400W |     915MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       390      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Already tested that nvcc works by making a simplistic cuda test program.

Edit:
The issue being:

Errors:
Error reading result from hvm. Output :
CUDA not available!

when trying to run a .bend file with the run-cu command.

jzinno · 2024-05-17T23:45:22Z

Afaik CUDA not available on WSL can be resolved following:

https://docs.nvidia.com/cuda/wsl-user-guide/index.html

Make sure the CUDA paths are sourced in your shell of choice's config as well. Can verify with:

which nvcc

Adding CUDA libs to LD_LIBRARY_PATH as well is probably good to do as well for good measure. Not sure how important here, on mobile currently so can't reproduce right now.

If CUDA not found persists, try reinstalling Bend and HVM after making sure CUDA is available orthogonally.

cargo +nightly hvm --force
cargo +nightly bend --force

Force reinstall because I believe CUDA availability is check as part of the build script.

Andreesian · 2024-05-18T04:11:09Z

Afaik CUDA not available on WSL can be resolved following:

https://docs.nvidia.com/cuda/wsl-user-guide/index.html

Make sure the CUDA paths are sourced in your shell of choice's config as well. Can verify with:
which nvcc
Adding CUDA libs to LD_LIBRARY_PATH as well is probably good to do as well for good measure. Not sure how important here, on mobile currently so can't reproduce right now.

If CUDA not found persists, try reinstalling Bend and HVM after making sure CUDA is available orthogonally.
cargo +nightly hvm --force
cargo +nightly bend --force
Force reinstall because I believe CUDA availability is check as part of the build script.

This worked for me after trying everything else.

Epicguru · 2024-05-18T10:47:08Z

Can also confirm that forcing a re-install also fixed the issue for me, although I did not do anything else, CUDA was already added to the path.

VictorTaelin · 2024-05-18T15:01:19Z

Since HVM-CUDA has been hardcoded to RTX 4090, older GPUs (which have 1/2 the shared_memory size) will not work. That's a hindsight. I'll refactor that hardcoded number to be dynamic instead, and properly query the available L1 cache size.

pema99 · 2024-05-18T17:33:57Z

I guess that clears things up - I was in the previous issue, facing the "invalid arguments" issue. Using a 2070 super

0m3rF · 2024-05-19T12:38:07Z

Facing the same issue "invalid arguments" in WSL2 ubuntu. Using 2080. Can run python with numba but not bend run-cu.

kings177 · 2024-05-20T17:46:29Z

So, as a summary for the problems:

when it comes to CUDA not available the problem can vary, with the cause for most being CUDA paths not set correctly.

when it comes to Failed to launch kernels, that stems from the fact that the number for the shared mem is currently hardcoded to fit the GPUs with 96KB of per block shared memory, we plan on soon releasing a dynamic version of this.

RileySeaburg · 2024-05-21T18:06:48Z

Since HVM-CUDA has been hardcoded to RTX 4090, older GPUs (which have 1/2 the shared_memory size) will not work. That's a hindsight. I'll refactor that hardcoded number to be dynamic instead, and properly query the available L1 cache size.

This would be great, I'm running an older NVIDIA GPU and getting the same error after reinstalling.

Using WSL

riley@Virtual-Desktop-1:~/programming/bend$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

riley@Virtual-Desktop-1:~/programming/bend$ nvidia-smi
Tue May 21 12:47:00 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.103                Driver Version: 537.13       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 ...    On  | 00000000:01:00.0 Off |                  N/A |
| 40%   33C    P8              14W / 250W |   1101MiB /  8192MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       439      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

shevisj · 2024-05-21T21:06:35Z

Same issue running on a 2080ti. Commenting here to follow up once dynamic shared memory support is added.

speedfox-uk · 2024-06-11T21:29:57Z

Getting the same issue. Here is my nvidia-msi output

speedfox@london:~/src/Bend/examples$ nvidia-smi
Tue Jun 11 22:21:19 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2060         On | 00000000:01:00.0 Off |                  N/A |
|  0%   43C    P8                7W / 184W|    135MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      3879      G   /usr/lib/xorg/Xorg                          101MiB |
|    0   N/A  N/A      4029      G   /usr/bin/gnome-shell                         10MiB |
|    0   N/A  N/A      6336      G   ...seed-version=20240607-130129.053000       21MiB |
+---------------------------------------------------------------------------------------+

kings177 closed this as completed May 17, 2024

kings177 reopened this May 17, 2024

kings177 added the enhancement New feature or request label May 17, 2024

kings177 mentioned this issue May 17, 2024

CUDA not available! even if it is installed HigherOrderCO/Bend#342

Closed

kings177 changed the title ~~CUDA not available when running gen-cu~~ CUDA not available or Failed to Launch Kernels May 20, 2024

kings177 mentioned this issue May 20, 2024

Error reading result from hvm. Output #310

Closed

2lian mentioned this issue May 22, 2024

Bitonic Sort on fails on CUDA with (error code an illegal memory access was encountered) #314

Open

kings177 mentioned this issue May 24, 2024

Failed to launch kernels (error code invalid argument) #293

Closed

developedby mentioned this issue May 26, 2024

Script stuck when using GPU with bend-cu but not when using CPU with bend or bend-c HigherOrderCO/Bend#498

Closed

kings177 changed the title ~~CUDA not available or Failed to Launch Kernels~~ CUDA not available or Failed to Launch Kernels (Error code invalid argument) May 29, 2024

kings177 changed the title ~~CUDA not available or Failed to Launch Kernels (Error code invalid argument)~~ CUDA not available or Failed to Launch Kernels (error code invalid argument) May 29, 2024

EloiStree mentioned this issue Jun 16, 2024

Topic: NVidia under 3060 ? EloiStree/HelloRustBending#25

Open

kings177 linked a pull request Aug 16, 2024 that will close this issue

adds dynamic shared mem allocation to cuda kernels #413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA not available or Failed to Launch Kernels (error code invalid argument) #283

CUDA not available or Failed to Launch Kernels (error code invalid argument) #283

kings177 commented May 17, 2024 •

edited

Loading

jzinno commented May 17, 2024

kings177 commented May 17, 2024

kings177 commented May 17, 2024

Epicguru commented May 17, 2024

MrCreeps commented May 17, 2024 •

edited

Loading

jzinno commented May 17, 2024 •

edited

Loading

Andreesian commented May 18, 2024

Epicguru commented May 18, 2024

VictorTaelin commented May 18, 2024

pema99 commented May 18, 2024

0m3rF commented May 19, 2024

kings177 commented May 20, 2024

RileySeaburg commented May 21, 2024

shevisj commented May 21, 2024

speedfox-uk commented Jun 11, 2024

CUDA not available or Failed to Launch Kernels (error code invalid argument) #283

CUDA not available or Failed to Launch Kernels (error code invalid argument) #283

Comments

kings177 commented May 17, 2024 • edited Loading

jzinno commented May 17, 2024

kings177 commented May 17, 2024

kings177 commented May 17, 2024

Epicguru commented May 17, 2024

MrCreeps commented May 17, 2024 • edited Loading

jzinno commented May 17, 2024 • edited Loading

Andreesian commented May 18, 2024

Epicguru commented May 18, 2024

VictorTaelin commented May 18, 2024

pema99 commented May 18, 2024

0m3rF commented May 19, 2024

kings177 commented May 20, 2024

RileySeaburg commented May 21, 2024

shevisj commented May 21, 2024

speedfox-uk commented Jun 11, 2024

kings177 commented May 17, 2024 •

edited

Loading

MrCreeps commented May 17, 2024 •

edited

Loading

jzinno commented May 17, 2024 •

edited

Loading