Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with 65B model #84

Closed
matthew-mcallister opened this issue Mar 13, 2023 · 6 comments
Closed

Segfault with 65B model #84

matthew-mcallister opened this issue Mar 13, 2023 · 6 comments
Labels
need more info The OP should provide more details about the issue

Comments

@matthew-mcallister
Copy link

This is the output with -fsanitize=address:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==167666==ERROR: AddressSanitizer: SEGV on unknown address 0x558c0562c438 (pc 0x558a27cc9807 bp 0x000000000000 sp 0x7ffeb2f57310 T0)
==167666==The signal is caused by a READ memory access.
    #0 0x558a27cc9807 in ggml_element_size (/home/mattmcal/repos/llama.cpp/main+0x49807)
    #1 0x558a27c9c03c in llama_eval(llama_model const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<float, std::allocator<float> >&, unsigned long&) (/home/mattmcal/repos/llama.cpp/main+0x1c03c)
    #2 0x558a27c960fb in main (/home/mattmcal/repos/llama.cpp/main+0x160fb)
    #3 0x7fe45e046189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #4 0x7fe45e046244 in __libc_start_main_impl ../csu/libc-start.c:381
    #5 0x558a27c9b1a0 in _start (/home/mattmcal/repos/llama.cpp/main+0x1b1a0)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/mattmcal/repos/llama.cpp/main+0x49807) in ggml_element_size

I had to increase ctx_size otherwise I got this error:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 33373704448, available 33292002560)

Is GGML trying to use more RAM than it malloc'd?

@ggerganov ggerganov added the need more info The OP should provide more details about the issue label Mar 13, 2023
@ggerganov
Copy link
Owner

ggml buffers are preallocated with fixed mem size. If you run out of the buffer during inference, you get this error.
It's very possible that for some parameters, the mem size is not enough. This will be improved over time.

Can you provide the parameters for which you get this error?

@matthew-mcallister
Copy link
Author

matthew-mcallister commented Mar 13, 2023

Basically, this fails if I increase n_ctx, beyond the default 512, which I can tell isn't fully supported. I increased the mem_size allocated by ggml by adding to ctx_size, but it still uses more memory than allocated without showing any warning/error messages.

These parameters

llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 1024
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 68613.73 MB
llama_model_load: memory_size =  5120.00 MB, n_mem = 81920
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.150000

actually cause a null dereference partway through inference:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==27991==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55e82aebc227 bp 0x7ffddaffb650 sp 0x7ffddaffb640 T0)
==27991==The signal is caused by a WRITE memory access.
==27991==Hint: address points to the zero page.
    #0 0x55e82aebc227 in quantize_row_q4_0 (/home/mattmcal/repos/llama.cpp/main+0x44227)
    #1 0x55e82aebcacb in ggml_compute_forward_mul_mat_q4_0_f32 (/home/mattmcal/repos/llama.cpp/main+0x44acb)
    #2 0x55e82aecd36c in ggml_compute_forward (/home/mattmcal/repos/llama.cpp/main+0x5536c)
    #3 0x55e82aeda061 in ggml_graph_compute (/home/mattmcal/repos/llama.cpp/main+0x62061)
    #4 0x55e82ae94540 in llama_eval(llama_model const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<float, std::allocator<float> >&, unsigned long&) (/home/mattmcal/repos/llama.cpp/main+0x1c540)
    #5 0x55e82ae8e5b0 in main (/home/mattmcal/repos/llama.cpp/main+0x165b0)
    #6 0x7f9a9c646189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #7 0x7f9a9c646244 in __libc_start_main_impl ../csu/libc-start.c:381
    #8 0x55e82ae931a0 in _start (/home/mattmcal/repos/llama.cpp/main+0x1b1a0)

@drewcrawford
Copy link

related discussion #71

@prusnak
Copy link
Collaborator

prusnak commented Mar 30, 2023

@matthew-mcallister Can you try again with the code from master (which is now using mmap to load the weights)?

@matthew-mcallister
Copy link
Author

matthew-mcallister commented Mar 31, 2023

It still segfaults after 512 tokens.

EDIT: Hold on, I might be mistaken. I haven't finished converting all the tensors yet.

@matthew-mcallister
Copy link
Author

OK, this works now. Fantastic, thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need more info The OP should provide more details about the issue
Projects
None yet
Development

No branches or pull requests

4 participants