ggml_backend_metal_buffer_type_alloc_buffer() allocates from host memory #10379
CharlesJu1
started this conversation in
General
Replies: 1 comment 1 reply
-
The Metal backend implementation implicitly assumes shared memory is supported, i.e. it's mainly developed with Apple Silicon in mind. So support with external GPUs like in your case is likely not supported. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Well the name ggml_backend_metal_buffer_type_alloc_buffer sounds like it is going to allocate memory from backend metal (in my case, Radeon Pro 555X 4 GB), but actually the memory is allocated from host memory. And [device newBufferWithBytesNoCopy:ctx->all_data] is just a wrapper of the allocated host memory. I am not familiar with how Metal works.
Could someone confirm that the buffer memory is indeed allocated from host memory?
And why not allocate the buffer memory on the Radeon Pro 555X device? Is not it faster to have the kv cache buffer allocated on the GPU itself?
Beta Was this translation helpful? Give feedback.
All reactions