Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered") #2131

Open
VakeDomen opened this issue Apr 27, 2024 · 1 comment

Comments

@VakeDomen
Copy link

VakeDomen commented Apr 27, 2024

I am relatively new so I hope I am not just doing something very stupid :)
I am trying to adapt the quantized example for my use case. The inference code is pretty much the same as the example. In general, the code works and I am prompting 2 models on 2 separate GPUs in a loop. After N iterations (N is different every time but in range <100) I encounter the error below.
I am running quantized llama-3-8b-instruct from .gguf.

I would appreciate any tips on this topic if the error is on my side. Here is the access to the code.

NOTE: I'm running two A6000 GPUs. This is the nvcc version:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
thread 'thread '<unnamed><unnamed>' panicked at ' panicked at /home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs/home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs::208208::7676:
:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'thread '<unnamed><unnamed>' panicked at ' panicked at /home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs/home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs::208208::7676:
:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")

stack backtrace:
   0:     0x58c00bd19556 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h410d4c66be4e37f9
   1:     0x58c00bd43550 - core::fmt::write::he40921d4802ce2ac
   2:     0x58c00bd16d4f - std::io::Write::write_fmt::h5de5a4e7037c9b20
   3:     0x58c00bd19334 - std::sys_common::backtrace::print::h11c067a88e3bdb22
   4:     0x58c00bd1abb7 - std::panicking::default_hook::{{closure}}::h8c832ecb03fde8ea
   5:     0x58c00bd1a919 - std::panicking::default_hook::h1633e272b4150cf3
   6:     0x58c00bd1b048 - std::panicking::rust_panic_with_hook::hb164d19c0c1e71d4
   7:     0x58c00bd1af22 - std::panicking::begin_panic_handler::{{closure}}::h0369088c533c20e9
   8:     0x58c00bd19a56 - std::sys_common::backtrace::__rust_end_short_backtrace::hc11d910daf35ac2e
   9:     0x58c00bd1ac74 - rust_begin_unwind
  10:     0x58c00b9113d5 - core::panicking::panic_fmt::ha6effc2775a0749c
  11:     0x58c00b911923 - core::result::unwrap_failed::ha188096f98826595
  12:     0x58c00ba2b6c4 - <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop::h4c289e05ebd51ae6
  13:     0x58c00ba2aafc - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<f32>>::hcbf6a15615cee068
  14:     0x58c00ba2b1ca - alloc::sync::Arc<T,A>::drop_slow::h994a5bb01f1fc442
  15:     0x58c00ba2af50 - alloc::sync::Arc<T,A>::drop_slow::h4a65dc7109aa30f1
  16:     0x58c00ba1802a - candle_transformers::models::quantized_llama::ModelWeights::forward::had1312fe871968d8
  17:     0x58c00b94121d - llm_bitcoin_inscription_analysis::llm::prompt::prompt_model::hbe917d2214140c60
  18:     0x58c00b96e876 - core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut::h5f9d812f749ee289
  19:     0x58c00b96b756 - rayon::iter::plumbing::Folder::consume_iter::h2c8efde69e0f7383
  20:     0x58c00b971bfc - rayon::iter::plumbing::bridge_producer_consumer::helper::h814a881abff08b3e
  21:     0x58c00b973006 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h8fb2eedfc5ec12fd
  22:     0x58c00b90ce9f - rayon_core::registry::WorkerThread::wait_until_cold::hc0ea83de9f250620
  23:     0x58c00bceaa32 - rayon_core::registry::ThreadBuilder::run::hedc5a5eddbc123f1
  24:     0x58c00bcedbca - std::sys_common::backtrace::__rust_begin_short_backtrace::h14baabb9af848a11
  25:     0x58c00bceeaef - core::ops::function::FnOnce::call_once{{vtable.shim}}::h49599ea7439698c3
  26:     0x58c00bd1fb95 - std::sys::pal::unix::thread::Thread::new::thread_start::h3631815ad38387d6
  27:     0x7b8d4de94ac3 - start_thread
                               at ./nptl/pthread_create.c:442:8
  28:     0x7b8d4df26850 - __GI___clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  29:                0x0 - <unknown>
stack backtrace:
thread '<unnamed>' panicked at library/core/src/panicking.rs:163 : 5 :
0panic in a destructor during cleanup:
 thread caused non-unwinding panic. aborting.
   0x58c00bd19556 - <std::sys_common::backtraceAborted (core dumped)
@workingjubilee
Copy link

workingjubilee commented Jul 16, 2024

I have encountered similar problems with quantized models. Running COMPUTE-SANITIZER gives this, except please infer a much larger backtrace...

========= COMPUTE-SANITIZER
Got device: Cuda(CudaDevice(DeviceId(1)))
Loading model contents..
Creating Model Weights
Got context length: 4096
Getting tokenizer
Device: Cuda(CudaDevice(DeviceId(1)))
Starting inferencing...
Getting tokens
Got 36644 tokens
parsing prompt tokens
Getting logits processor
getting first next_token
Device: Cuda(CudaDevice(DeviceId(1)))
inner: got input
========= Invalid __global__ write of size 1 bytes
=========     at quantize_q8_1+0x560
=========     by thread (192,0,0) in block (4,8195,0)
=========     Address 0x3392b3015c is out of bounds
=========     and is 29,021 bytes after the nearest allocation at 0x338e329000 of size 75,497,472 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x334660]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudarc::driver::sys::sys_12030::Lib::cuLaunchKernel::hfe263ae91520c126 in src/driver/sys/sys_12030.rs:15843 [0xedff84]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:cudarc::driver::result::launch_kernel::h7dc2536fdc655909 in "${HOME}"/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/result.rs:983 [0xb38613]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_core::quantized::cuda::quantize_q8_1::h9c8bbb2250b40525 in src/quantized/cuda.rs:59 [0xcc7b0d]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_core::quantized::cuda::mul_mat_via_q8_1::h7a6eaa861817963c in src/quantized/cuda.rs:320 [0xccdff1]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_core::quantized::cuda::QCudaStorage::dequantize_matmul::hece58c0f1fba08e1 in src/quantized/cuda.rs:551 [0xcd1d80]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_core::quantized::cuda::QCudaStorage::fwd::hb9af44186a442957 in src/quantized/cuda.rs:475 [0xcd09bf]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:_$LT$candle_core..quantized..QTensor$u20$as$u20$candle_core..custom_op..CustomOp1$GT$::cuda_fwd::h3dae5127f4b64a02 in src/quantized/mod.rs:522 [0xb3b32c]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_core::storage::Storage::apply_op1::hb1ae45780a7f5aac in src/storage.rs:203 [0xb0985d]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_core::custom_op::_$LT$impl$u20$candle_core..tensor..Tensor$GT$::apply_op1_no_bwd::hd843f14a3d52ec41 in src/custom_op.rs:157 [0xaf403c]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:_$LT$candle_core..quantized..QMatMul$u20$as$u20$candle_core..Module$GT$::forward::h733f98c2fe1f3d36 in src/quantized/mod.rs:529 [0xb3b3f5]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_transformers::models::quantized_llama::QMatMul::forward::h551d57b56deeba80 in src/models/quantized_llama.rs:27 [0x7149da]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_transformers::models::quantized_llama::LayerWeights::forward_attn::h56c8c3f144a68fc7 in src/models/quantized_llama.rs:173 [0x717e69]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:candle_transformers::models::quantized_llama::ModelWeights::forward::h0dfb5d857c758ce3 in src/models/quantized_llama.rs:476 [0x71c732]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:_$LT$ai_lib..models..llama_3_70b_instruct_32k_gguf..Llama3_70bInstruct32kGGUF$u20$as$u20$ai_lib..models..model_wrapper..ModelWrapper$GT$::inference::h279f36a3b9cc9fbf in ai-lib/src/models/llama_3_70b_instruct_32k_gguf.rs:189 [0x198cd2]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:cli::main::h8ee7a7979e775a0e in cli/src/main.rs:103 [0x19587a]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:core::ops::function::FnOnce::call_once::h772466b7bf645693 in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250 [0x194fcb]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:std::sys_common::backtrace::__rust_begin_short_backtrace::h8af7e217acbaa4da in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:161 [0x19540e]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h67443931d40186ff in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166 [0x194f71]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:std::rt::lang_start_internal::h103c42a9c4e95084 in library/std/src/rt.rs:148 [0x1b41a03]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:std::rt::lang_start::he3400f8001dc9f83 in /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:165 [0x194f4a]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:main [0x195c7e]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli
=========     Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:58 [0x29d90]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main in ../csu/libc-start.c:379 [0x29e40]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0x194e45]
=========                in "${HOME}"/"${CARGO_PROJECT}"/./target/debug/cli

We see this backtrace repeat again because we see successive threads writing to successively higher addresses:

========= Invalid __global__ write of size 1 bytes
=========     at quantize_q8_1+0x560
=========     by thread (223,0,0) in block (4,8195,0)
=========     Address 0x3392b3017b is out of bounds
=========     and is 29,052 bytes after the nearest allocation at 0x338e329000 of size 75,497,472 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x334660]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1

...though, given we're talking about threads here, not always in a linear order:

========= Invalid __global__ write of size 1 bytes
=========     at quantize_q8_1+0x560
=========     by thread (0,0,0) in block (10,8194,0)
=========     Address 0x3392b2e344 is out of bounds
=========     and is 21,317 bytes after the nearest allocation at 0x338e329000 of size 75,497,472 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x334660]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1

And then we keep going, because we eventually reach this:

thread 'main' panicked at "${HOME}"/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at "${HOME}"/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
stack backtrace:
   0:     0x55b30644ed72 - std::backtrace_rs::backtrace::libunwind::trace::he4ee80166a02c846
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5

Which is the last backtrace printed (and a different one! see coreylowman/cudarc#277 for more on that) before we reach the "end":

thread 'main' panicked at library/core/src/panicking.rs:164:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 4835 errors
========= ERROR SUMMARY: 4735 errors were not printed. Use --print-limit option to adjust the number of printed errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants