Failed to launch kernels (error code invalid argument) #293

MehmetMHY · 2024-05-18T08:43:28Z

Describe the bug
I ran into a bug well trying to use Bend. Well trying to run Bend in "cuda mode" I get the following error:

Error reading result from hvm. Output :
Failed to launch kernels (error code invalid argument)!
exit status: 1

I was gonna make an issue for this in Bend but, based on the error message, I think it's a HVM bug.

To Reproduce
Steps to reproduce the behavior:

Install Bend. This is how I installed it.
Grab the fib.bend example from the Bend repo HERE. Here is the code:

add = λa λb (+ a b)
fib = λx switch x {
  0: 1
  _: let p = x-1; switch p {
    0: 1
    _: (+ (fib p) (fib p-1))
  }
}
main = (fib 30)

Run the following Bend command:

bend run-cu fib.bend

Expected behavior
This is error/bug did not happen, the output would be this (ideally fast):

Result: 1346269

Desktop (please complete the following information):

OS: Ubuntu 20.04.6 LTS x86_64
CPU: AMD Ryzen 5 3600 (12) @ 3.600GHz
GPU: NVIDIA GeForce RTX 2070 SUPER
Cuda Version: 12.3, V12.3.52

Additional context
n/a

The text was updated successfully, but these errors were encountered:

janschiefer · 2024-05-18T22:08:43Z

Just added a little debug helper function and manually compiled with:

bend gen-cu test.bend > test.cu 
nvcc -I/usr/local/cuda/include -L/usr/local/cuda/lib -O0 -g test.cu -o test_cuda
./test_cuda


#include <iostream>

#define CHECK_CUDA_ERROR(val) check((val), #val, __FILE__, __LINE__)
void check(cudaError_t err, const char* const func, const char* const file,
           const int line)
{
    if (err != cudaSuccess)
    {
        std::cerr << "CUDA Runtime Error at: " << file << ":" << line
                  << std::endl;
        std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
        // We don't exit when we encounter CUDA errors in this example.
        // std::exit(EXIT_FAILURE);
    }
}

Output:

CUDA Runtime Error at: test.cu:2597
invalid argument cudaFuncSetAttribute(evaluator, cudaFuncAttributeMaxDynamicSharedMemorySize, sizeof(LNet))
Failed to launch kernels (error code invalid argument)!

cudaFuncSetAttribute seems to choke on evaluator or sizeof(LNet)

FreezePhoenix · 2024-05-19T03:43:38Z

For some devices, it seems that sizeof(LNet) exceeds the maximum value for cudaDevAttrMaxSharedMemoryPerBlockOptin

For my device, I have a maximum value of this of 65536, versus the required size of 98304 for LNet.

kings177 · 2024-05-24T14:42:47Z

This problem stems from the fact that your GPU does not support >=96KB of shared mem per block, which is what is currently hardcoded on the HVM, we plan on soon releasing a dynamic mem allocation, for now, in order to fix this, please set the shared mem, from 96KB:

const u32 L_NODE_LEN = 0x2000;
const u32 L_VARS_LEN = 0x2000;

to 48KB:

const u32 L_NODE_LEN = 0x1000;
const u32 L_VARS_LEN = 0x1000;

for your GPU (2070), since it has 7.5 Compute Capability, you have 64KB max shared mem, so i guess you could use 0x1500.

closing this since a duplicate of #283

kings177 closed this as completed May 24, 2024

kings177 closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to launch kernels (error code invalid argument) #293

Failed to launch kernels (error code invalid argument) #293

MehmetMHY commented May 18, 2024

janschiefer commented May 18, 2024

FreezePhoenix commented May 19, 2024

kings177 commented May 24, 2024

Failed to launch kernels (error code invalid argument) #293

Failed to launch kernels (error code invalid argument) #293

Comments

MehmetMHY commented May 18, 2024

janschiefer commented May 18, 2024

FreezePhoenix commented May 19, 2024

kings177 commented May 24, 2024