ggml_backend_metal_buffer_type_alloc_buffer() allocates from host memory #10379

CharlesJu1 · 2024-11-18T10:48:24Z

CharlesJu1
Nov 18, 2024

Well the name ggml_backend_metal_buffer_type_alloc_buffer sounds like it is going to allocate memory from backend metal (in my case, Radeon Pro 555X 4 GB), but actually the memory is allocated from host memory. And [device newBufferWithBytesNoCopy:ctx->all_data] is just a wrapper of the allocated host memory. I am not familiar with how Metal works.

Could someone confirm that the buffer memory is indeed allocated from host memory?

And why not allocate the buffer memory on the Radeon Pro 555X device? Is not it faster to have the kv cache buffer allocated on the GPU itself?

static ggml_backend_buffer_t ggml_backend_metal_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) {
    struct ggml_backend_metal_buffer_context * ctx = calloc(1, sizeof(struct ggml_backend_metal_buffer_context));

    const size_t size_page = sysconf(_SC_PAGESIZE);

    size_t size_aligned = size;
    if ((size_aligned % size_page) != 0) {
        size_aligned += (size_page - (size_aligned % size_page));
    }

    id<MTLDevice> device = ggml_backend_metal_device_acq(buft->device->context);

    ctx->all_data = ggml_metal_host_malloc(size_aligned);
    ctx->all_size = size_aligned;
    ctx->owned = true;
    ctx->n_buffers = 1;

    if (ctx->all_data != NULL) {
        ctx->buffers[0].data  = ctx->all_data;
        ctx->buffers[0].size  = size;
        ctx->buffers[0].metal = nil;

        if (size_aligned > 0) {
            ctx->buffers[0].metal = [device newBufferWithBytesNoCopy:ctx->all_data
                                            length:size_aligned
                                            options:MTLResourceStorageModeShared
                                            deallocator:nil];
        }
    }

    if (size_aligned > 0 && (ctx->all_data == NULL || ctx->buffers[0].metal == nil)) {
        GGML_LOG_ERROR("%s: error: failed to allocate buffer, size = %8.2f MiB\n", __func__, size_aligned / 1024.0 / 1024.0);
        free(ctx);
        ggml_backend_metal_device_rel(buft->device->context);
        return NULL;
    }

    //ggml_backend_metal_log_allocated_size(device, size_aligned);

    return ggml_backend_buffer_init(buft, ggml_backend_metal_buffer_i, ctx, size);
}

ggerganov · 2024-11-19T10:07:59Z

ggerganov
Nov 19, 2024
Maintainer

The Metal backend implementation implicitly assumes shared memory is supported, i.e. it's mainly developed with Apple Silicon in mind. So support with external GPUs like in your case is likely not supported.

1 reply

CharlesJu1 Nov 20, 2024
Author

Thanks for the clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml_backend_metal_buffer_type_alloc_buffer() allocates from host memory #10379

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

ggml_backend_metal_buffer_type_alloc_buffer() allocates from host memory #10379

CharlesJu1 Nov 18, 2024

Replies: 1 comment · 1 reply

ggerganov Nov 19, 2024 Maintainer

CharlesJu1 Nov 20, 2024 Author

CharlesJu1
Nov 18, 2024

Replies: 1 comment 1 reply

ggerganov
Nov 19, 2024
Maintainer

CharlesJu1 Nov 20, 2024
Author