Replies: 1 comment
-
For GPU, just set tensor split to only one of the GPU and split mode to None, it will remove the overhead and reach same performances of CUDA_VISIBLE_DEVICES |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I would like to use either the CPU or GPU during runtime. Currently, it seems that this can only be controlled at the compilation stage using options like GGML_METAL or GGML_BLAS. Is there a way to configure the code to select my computation device before initializing ggml_backend?
Beta Was this translation helpful? Give feedback.
All reactions