Skip to content

Commit

Permalink
[Example] Update the examples to GGUF (#38)
Browse files Browse the repository at this point in the history
* Update the examples to GGUF

Signed-off-by: Michael Yuan <[email protected]>

* Use AUTO

Signed-off-by: Michael Yuan <[email protected]>

* Change to a different model in CI

Signed-off-by: Michael Yuan <[email protected]>

---------

Signed-off-by: Michael Yuan <[email protected]>
  • Loading branch information
juntao authored Sep 25, 2023
1 parent 9ab71d2 commit 5f9ff89
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 11 deletions.
6 changes: 4 additions & 2 deletions .github/workflows/llama.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ jobs:
- name: Example
run: |
cd wasmedge-ggml-llama
curl -LO https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin
# curl -LO https://huggingface.co/kirp/TinyLlama-1.1B-Chat-v0.2-gguf/resolve/main/ggml-model-q4_0.gguf
curl -LO https://huggingface.co/juanjgit/orca_mini_3B-GGUF/resolve/main/orca-mini-3b.q4_0.gguf
cargo build --target wasm32-wasi --release
wasmedge compile target/wasm32-wasi/release/wasmedge-ggml-llama.wasm wasmedge-ggml-llama.wasm
wasmedge --dir .:. --nn-preload default:GGML:CPU:orca-mini-3b.ggmlv3.q4_0.bin wasmedge-ggml-llama.wasm default 'Once upon a time, '
# wasmedge --dir .:. --nn-preload default:GGML:CPU:ggml-model-q4_0.gguf wasmedge-ggml-llama.wasm default '<|im_start|>user\nWhere is the capital of Japan?<|im_end|>\n<|im_start|>assistant\n'
wasmedge --dir .:. --nn-preload default:GGML:CPU:orca-mini-3b.q4_0.gguf wasmedge-ggml-llama.wasm default '### System:\nYou are an AI assistant\n\n### User:\nWhere is the capital of Japan?\n\n### Response:\n'
11 changes: 3 additions & 8 deletions wasmedge-ggml-llama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,13 @@ cargo build --target wasm32-wasi --release
```

The output WASM file will be at `target/wasm32-wasi/release/`.
To speed up the image processing, we can enable the AOT mode in WasmEdge with:

```bash
wasmedgec target/wasm32-wasi/release/wasmedge-ggml-llama.wasm wasmedge-ggml-llama-aot.wasm
```

## Get Model

Download llama model:

```bash
curl -LO https://huggingface.co/localmodels/Llama-2-7B-Chat-ggml/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin
curl -LO https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q5_K_M.gguf
```

### Execute
Expand All @@ -37,8 +32,8 @@ Execute the WASM with the `wasmedge` using the named model feature to preload la

```bash
wasmedge --dir .:. \
--nn-preload default:GGML:CPU:llama-2-7b-chat.ggmlv3.q4_0.bin \
wasmedge-ggml-llama-aot.wasm default 'Once upon a time, '
--nn-preload default:GGML:CPU:llama-2-7b.Q5_K_M.gguf \
target/wasm32-wasi/release/wasmedge-ggml-llama.wasm default 'Once upon a time, '
```

After executing the command, it takes some time to wait for the output.
Expand Down
2 changes: 1 addition & 1 deletion wasmedge-ggml-llama/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ fn main() {
let prompt: &str = &args[2];

let graph =
wasi_nn::GraphBuilder::new(wasi_nn::GraphEncoding::Ggml, wasi_nn::ExecutionTarget::CPU)
wasi_nn::GraphBuilder::new(wasi_nn::GraphEncoding::Ggml, wasi_nn::ExecutionTarget::AUTO)
.build_from_cache(model_name)
.unwrap();
println!("Loaded model into wasi-nn with ID: {:?}", graph);
Expand Down
Binary file added wasmedge-ggml-llama/wasmedge-ggml-llama.wasm
Binary file not shown.

0 comments on commit 5f9ff89

Please sign in to comment.