Segfault with 65B model #84

matthew-mcallister · 2023-03-13T07:19:05Z

This is the output with -fsanitize=address:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==167666==ERROR: AddressSanitizer: SEGV on unknown address 0x558c0562c438 (pc 0x558a27cc9807 bp 0x000000000000 sp 0x7ffeb2f57310 T0)
==167666==The signal is caused by a READ memory access.
    #0 0x558a27cc9807 in ggml_element_size (/home/mattmcal/repos/llama.cpp/main+0x49807)
    #1 0x558a27c9c03c in llama_eval(llama_model const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<float, std::allocator<float> >&, unsigned long&) (/home/mattmcal/repos/llama.cpp/main+0x1c03c)
    #2 0x558a27c960fb in main (/home/mattmcal/repos/llama.cpp/main+0x160fb)
    #3 0x7fe45e046189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #4 0x7fe45e046244 in __libc_start_main_impl ../csu/libc-start.c:381
    #5 0x558a27c9b1a0 in _start (/home/mattmcal/repos/llama.cpp/main+0x1b1a0)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/mattmcal/repos/llama.cpp/main+0x49807) in ggml_element_size

I had to increase ctx_size otherwise I got this error:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 33373704448, available 33292002560)

Is GGML trying to use more RAM than it malloc'd?

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-03-13T16:39:10Z

ggml buffers are preallocated with fixed mem size. If you run out of the buffer during inference, you get this error.
It's very possible that for some parameters, the mem size is not enough. This will be improved over time.

Can you provide the parameters for which you get this error?

matthew-mcallister · 2023-03-13T18:39:09Z

Basically, this fails if I increase n_ctx, beyond the default 512, which I can tell isn't fully supported. I increased the mem_size allocated by ggml by adding to ctx_size, but it still uses more memory than allocated without showing any warning/error messages.

These parameters

llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 1024
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 68613.73 MB
llama_model_load: memory_size =  5120.00 MB, n_mem = 81920
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.150000

actually cause a null dereference partway through inference:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==27991==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55e82aebc227 bp 0x7ffddaffb650 sp 0x7ffddaffb640 T0)
==27991==The signal is caused by a WRITE memory access.
==27991==Hint: address points to the zero page.
    #0 0x55e82aebc227 in quantize_row_q4_0 (/home/mattmcal/repos/llama.cpp/main+0x44227)
    #1 0x55e82aebcacb in ggml_compute_forward_mul_mat_q4_0_f32 (/home/mattmcal/repos/llama.cpp/main+0x44acb)
    #2 0x55e82aecd36c in ggml_compute_forward (/home/mattmcal/repos/llama.cpp/main+0x5536c)
    #3 0x55e82aeda061 in ggml_graph_compute (/home/mattmcal/repos/llama.cpp/main+0x62061)
    #4 0x55e82ae94540 in llama_eval(llama_model const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<float, std::allocator<float> >&, unsigned long&) (/home/mattmcal/repos/llama.cpp/main+0x1c540)
    #5 0x55e82ae8e5b0 in main (/home/mattmcal/repos/llama.cpp/main+0x165b0)
    #6 0x7f9a9c646189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #7 0x7f9a9c646244 in __libc_start_main_impl ../csu/libc-start.c:381
    #8 0x55e82ae931a0 in _start (/home/mattmcal/repos/llama.cpp/main+0x1b1a0)

drewcrawford · 2023-03-13T18:52:55Z

related discussion #71

prusnak · 2023-03-30T23:26:53Z

@matthew-mcallister Can you try again with the code from master (which is now using mmap to load the weights)?

matthew-mcallister · 2023-03-31T02:11:16Z

~~It still segfaults after 512 tokens.~~

EDIT: Hold on, I might be mistaken. I haven't finished converting all the tensors yet.

matthew-mcallister · 2023-03-31T05:04:49Z

OK, this works now. Fantastic, thanks for the update!

ggerganov added the need more info The OP should provide more details about the issue label Mar 13, 2023

matthew-mcallister closed this as completed Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault with 65B model #84

Segfault with 65B model #84

matthew-mcallister commented Mar 13, 2023

ggerganov commented Mar 13, 2023

matthew-mcallister commented Mar 13, 2023 •

edited

Loading

drewcrawford commented Mar 13, 2023

prusnak commented Mar 30, 2023

matthew-mcallister commented Mar 31, 2023 •

edited

Loading

matthew-mcallister commented Mar 31, 2023

Segfault with 65B model #84

Segfault with 65B model #84

Comments

matthew-mcallister commented Mar 13, 2023

ggerganov commented Mar 13, 2023

matthew-mcallister commented Mar 13, 2023 • edited Loading

drewcrawford commented Mar 13, 2023

prusnak commented Mar 30, 2023

matthew-mcallister commented Mar 31, 2023 • edited Loading

matthew-mcallister commented Mar 31, 2023

matthew-mcallister commented Mar 13, 2023 •

edited

Loading

matthew-mcallister commented Mar 31, 2023 •

edited

Loading