-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered") #2131
Comments
I have encountered similar problems with quantized models. Running COMPUTE-SANITIZER gives this, except please infer a much larger backtrace...
We see this backtrace repeat again because we see successive threads writing to successively higher addresses:
...though, given we're talking about threads here, not always in a linear order:
And then we keep going, because we eventually reach this:
Which is the last backtrace printed (and a different one! see coreylowman/cudarc#277 for more on that) before we reach the "end":
|
I am relatively new so I hope I am not just doing something very stupid :)
I am trying to adapt the quantized example for my use case. The inference code is pretty much the same as the example. In general, the code works and I am prompting 2 models on 2 separate GPUs in a loop. After N iterations (N is different every time but in range <100) I encounter the error below.
I am running quantized llama-3-8b-instruct from
.gguf
.I would appreciate any tips on this topic if the error is on my side. Here is the access to the code.
NOTE: I'm running two A6000 GPUs. This is the nvcc version:
The text was updated successfully, but these errors were encountered: