diff --git a/wasmedge-ggml-llama-interactive/README.md b/wasmedge-ggml-llama-interactive/README.md index a3124f2..59785b8 100644 --- a/wasmedge-ggml-llama-interactive/README.md +++ b/wasmedge-ggml-llama-interactive/README.md @@ -143,7 +143,7 @@ Supported parameters include: - `stream-stdout`: Set it to true to print the inferred tokens to standard output. (default: `false`) - `ctx-size`: Set the context size, the same as the `--ctx-size` parameter in llama.cpp. (default: `512`) - `n-predict`: Set the number of tokens to predict, the same as the `--n-predict` parameter in llama.cpp. (default: `512`) -- `n-gpu-layers`: Set the number of layers to store in VRAM, the same as the `--n-gpu-layers` parameter in llama.cpp. (default: `0`) +- `n-gpu-layers`: Set the number of layers to store in VRAM, the same as the `--n-gpu-layers` parameter in llama.cpp. When using Metal support in macOS, please set `n-gpu-layers` to `0` or do not set it for the default value. (default: `0`) - `reverse-prompt`: Set it to the token at which you want to halt the generation. Similar to the `--reverse-prompt` parameter in llama.cpp. (default: `""`) - `batch-size`: Set the number of batch size for prompt processing, the same as the `--batch-size` parameter in llama.cpp. (default: `512`)