Skip to content

Commit

Permalink
Fix ngl and unwrap issue (#87)
Browse files Browse the repository at this point in the history
* [Example] ggml: breaking: Must specify the n_gpu_layers on macOS

Signed-off-by: hydai <[email protected]>

* [Example] ggml: never use unwrap to check the stream-stdout

Signed-off-by: hydai <[email protected]>
  • Loading branch information
hydai committed Jan 23, 2024
1 parent a5bc02c commit 5db620f
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 4 deletions.
13 changes: 12 additions & 1 deletion wasmedge-ggml-llama-interactive/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,18 @@ wasmedge --dir .:. \

#### macOS

macOS will use the Metal framework by default. You don't have to specify the `n_gpu_layers` parameter.
macOS will use the Metal framework by default. llama.cpp supports `n_gpu_layers` now, please make sure you set the `n_gpu_layers` to offload the tensor layers into GPU.

Please use the following command to ensure the tensor layers of the model is offloaded into GPU:

```
# llama2-7b-chat provides 35 GPU layers. So, we have to set a value that is large or equal to 35.
# If you use a larger model, this value may change.
wasmedge --dir .:. \
--env n_gpu_layers=35 \
--nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf \
wasmedge-ggml-llama-interactive.wasm default
```

#### Linux + CUDA

Expand Down
8 changes: 5 additions & 3 deletions wasmedge-ggml-llama-interactive/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,11 @@ fn main() {
}

// Check streaming related options.
if is_compute_single && options["stream-stdout"].as_bool().unwrap() {
println!("[ERROR] compute_single and stream_stdout cannot be enabled at the same time.");
std::process::exit(1);
if is_compute_single {
if let Some(true) = options["stream-stdout"].as_bool() {
println!("[ERROR] compute_single and stream_stdout cannot be enabled at the same time.");
std::process::exit(1);
}
}

// We support both llama and chatml prompt format.
Expand Down
Binary file not shown.

0 comments on commit 5db620f

Please sign in to comment.