[Example] Update README for n-gpu-layers in macOS

Signed-off-by: dm4 <[email protected]>
second-state · Nov 3, 2023 · 3f93dd0 · 3f93dd0
1 parent e57a678
commit 3f93dd0
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/wasmedge-ggml-llama-interactive/README.md b/wasmedge-ggml-llama-interactive/README.md
@@ -143,7 +143,7 @@ Supported parameters include:
 - `stream-stdout`: Set it to true to print the inferred tokens to standard output. (default: `false`)
 - `ctx-size`: Set the context size, the same as the `--ctx-size` parameter in llama.cpp. (default: `512`)
 - `n-predict`: Set the number of tokens to predict, the same as the `--n-predict` parameter in llama.cpp. (default: `512`)
-- `n-gpu-layers`: Set the number of layers to store in VRAM, the same as the `--n-gpu-layers` parameter in llama.cpp. (default: `0`)
+- `n-gpu-layers`: Set the number of layers to store in VRAM, the same as the `--n-gpu-layers` parameter in llama.cpp. When using Metal support in macOS, please set `n-gpu-layers` to `0` or do not set it for the default value. (default: `0`)
 - `reverse-prompt`: Set it to the token at which you want to halt the generation. Similar to the `--reverse-prompt` parameter in llama.cpp.  (default: `""`)
 - `batch-size`: Set the number of batch size for prompt processing, the same as the `--batch-size` parameter in llama.cpp.  (default: `512`)