-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA not available or Failed to Launch Kernels (error code invalid argument) #283
Comments
Hey saw on HigherOrderCO/Bend/issues/320 that you've run into the Tested on Ryzed 3600XT & 2070 super This appears to be a related to the following in // Local Net
const u32 L_NODE_LEN = 0x2000;
const u32 L_VARS_LEN = 0x2000;
struct LNet {
Pair node_buf[L_NODE_LEN];
Port vars_buf[L_VARS_LEN];
}; It was suggested this might be related to shared mem, added a test to debug: // Check max shared memory size
int maxSharedMem;
cudaDeviceGetAttribute(&maxSharedMem, cudaDevAttrMaxSharedMemoryPerBlock, 0);
printf("Max shared memory per block: %d bytes\n", maxSharedMem);
// Configures Shared Memory Size
if (sizeof(LNet) <= maxSharedMem)
{
cudaFuncSetAttribute(evaluator, cudaFuncAttributeMaxDynamicSharedMemorySize, sizeof(LNet));
}
else
{
fprintf(stderr, "Error: LNet size (%zu bytes) exceeds max shared memory per block (%d bytes)\n", sizeof(LNet), maxSharedMem);
exit(EXIT_FAILURE);
}
// Configures Shared Memory Size
// cudaFuncSetAttribute(evaluator, cudaFuncAttributeMaxDynamicSharedMemorySize, sizeof(LNet));
changed // Local Net
const u32 L_NODE_LEN = 0x1000;
const u32 L_VARS_LEN = 0x1000;
struct LNet {
Pair node_buf[L_NODE_LEN];
Port vars_buf[L_VARS_LEN];
}; recompiled and running on GPU work well!
Perhaps |
oh nice, thank you, we will be working on making the HVM adaptable to multiples GPU, the current iteration was only developed taking the 4090 into account. |
i'll Leave this open as a reminder |
Hi, I've seen that HigherOrderCO/Bend#342 has been marked as a duplicate of this so I'll comment here. I'm using WSL on Windows 11, CUDA toolkit installed and verified working with other WSL and Docker programs, but getting the same My GPU is a RTX 4090. I don't necessarily see how this issue is a duplicate of the other, given that this issue was originally closed with the explaination that it was only developed with a 4090 in mind, but that is the same GPU I'm running (unless it's vendor specific somehow?). In any case, an exciting project. Keep up the good work! |
Might as well add that I also am having this issue present in HigherOrderCO/Bend#342. I'm using WSL (Ubuntu 22.04.03 LTS) on Windows 11 23H2. Output from
Output from
Already tested that nvcc works by making a simplistic cuda test program. Edit:
when trying to run a .bend file with the |
Afaik CUDA not available on WSL can be resolved following: https://docs.nvidia.com/cuda/wsl-user-guide/index.html Make sure the CUDA paths are sourced in your shell of choice's config as well. Can verify with:
Adding CUDA libs to LD_LIBRARY_PATH as well is probably good to do as well for good measure. Not sure how important here, on mobile currently so can't reproduce right now. If CUDA not found persists, try reinstalling Bend and HVM after making sure CUDA is available orthogonally.
Force reinstall because I believe CUDA availability is check as part of the build script. |
This worked for me after trying everything else. |
Can also confirm that forcing a re-install also fixed the issue for me, although I did not do anything else, CUDA was already added to the path. |
Since HVM-CUDA has been hardcoded to RTX 4090, older GPUs (which have 1/2 the shared_memory size) will not work. That's a hindsight. I'll refactor that hardcoded number to be dynamic instead, and properly query the available L1 cache size. |
I guess that clears things up - I was in the previous issue, facing the "invalid arguments" issue. Using a 2070 super |
Facing the same issue "invalid arguments" in WSL2 ubuntu. Using 2080. Can run python with numba but not bend run-cu. |
gen-cu
So, as a summary for the problems: when it comes to CUDA not available the problem can vary, with the cause for most being CUDA paths not set correctly. when it comes to Failed to launch kernels, that stems from the fact that the number for the shared mem is currently hardcoded to fit the GPUs with 96KB of per block shared memory, we plan on soon releasing a dynamic version of this. |
This would be great, I'm running an older NVIDIA GPU and getting the same error after reinstalling. Using WSL riley@Virtual-Desktop-1:~/programming/bend$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0 riley@Virtual-Desktop-1:~/programming/bend$ nvidia-smi
Tue May 21 12:47:00 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.103 Driver Version: 537.13 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 ... On | 00000000:01:00.0 Off | N/A |
| 40% 33C P8 14W / 250W | 1101MiB / 8192MiB | 3% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 439 G /Xwayland N/A |
+---------------------------------------------------------------------------------------+ |
Same issue running on a 2080ti. Commenting here to follow up once dynamic shared memory support is added. |
Getting the same issue. Here is my nvidia-msi output
|
References: HigherOrderCO/Bend#320 by rubenjr0 with contents:
"
Hello! I think I've encountered a bug. When running this example from the readme:
The output is 0. I've tried
bend run
,bend run-c
, andbend gen-cu
(bend run-cu says cuda is not available, so I manually compile it with nvcc).The output on my machine when running
sum(24, 0)
is 8388608, but on equivalent Haskell and Python programs the programs return 140737479966720. The results start to diverge when depth>=13.I was wondering what could be causing these issues, both the incorrect result when depth>=13, and the result=0 when depth>=25.
My computer specs:
OS: Pop_OS 22.04
CPU: AMD Ryzen 5 2600x (12 cores)
GPU: NVIDIA RTX 4060 ti (16GB)
"
The text was updated successfully, but these errors were encountered: