[Example] Update the examples to GGUF (#38)

* Update the examples to GGUF Signed-off-by: Michael Yuan <[email protected]> * Use AUTO Signed-off-by: Michael Yuan <[email protected]> * Change to a different model in CI Signed-off-by: Michael Yuan <[email protected]> --------- Signed-off-by: Michael Yuan <[email protected]>
second-state · Sep 25, 2023 · 5f9ff89 · 5f9ff89
1 parent 9ab71d2
commit 5f9ff89
Show file tree

Hide file tree

Showing 4 changed files with 8 additions and 11 deletions.
diff --git a/.github/workflows/llama.yml b/.github/workflows/llama.yml
@@ -46,7 +46,9 @@ jobs:
     - name: Example
       run: |
         cd wasmedge-ggml-llama
-        curl -LO https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin
+        # curl -LO https://huggingface.co/kirp/TinyLlama-1.1B-Chat-v0.2-gguf/resolve/main/ggml-model-q4_0.gguf
+        curl -LO https://huggingface.co/juanjgit/orca_mini_3B-GGUF/resolve/main/orca-mini-3b.q4_0.gguf
         cargo build --target wasm32-wasi --release
         wasmedge compile target/wasm32-wasi/release/wasmedge-ggml-llama.wasm wasmedge-ggml-llama.wasm
-        wasmedge --dir .:. --nn-preload default:GGML:CPU:orca-mini-3b.ggmlv3.q4_0.bin wasmedge-ggml-llama.wasm default 'Once upon a time, '
+        # wasmedge --dir .:. --nn-preload default:GGML:CPU:ggml-model-q4_0.gguf wasmedge-ggml-llama.wasm default '<|im_start|>user\nWhere is the capital of Japan?<|im_end|>\n<|im_start|>assistant\n'
+        wasmedge --dir .:. --nn-preload default:GGML:CPU:orca-mini-3b.q4_0.gguf wasmedge-ggml-llama.wasm default '### System:\nYou are an AI assistant\n\n### User:\nWhere is the capital of Japan?\n\n### Response:\n'
diff --git a/wasmedge-ggml-llama/README.md b/wasmedge-ggml-llama/README.md
@@ -17,18 +17,13 @@ cargo build --target wasm32-wasi --release
 ```
 
 The output WASM file will be at `target/wasm32-wasi/release/`.
-To speed up the image processing, we can enable the AOT mode in WasmEdge with:
-
-```bash
-wasmedgec target/wasm32-wasi/release/wasmedge-ggml-llama.wasm wasmedge-ggml-llama-aot.wasm
-```
 
 ## Get Model
 
 Download llama model:
 
 ```bash
-curl -LO https://huggingface.co/localmodels/Llama-2-7B-Chat-ggml/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin
+curl -LO https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q5_K_M.gguf
 ```
 
 ### Execute
@@ -37,8 +32,8 @@ Execute the WASM with the `wasmedge` using the named model feature to preload la
 
 ```bash
 wasmedge --dir .:. \
-  --nn-preload default:GGML:CPU:llama-2-7b-chat.ggmlv3.q4_0.bin \
-  wasmedge-ggml-llama-aot.wasm default 'Once upon a time, '
+  --nn-preload default:GGML:CPU:llama-2-7b.Q5_K_M.gguf \
+  target/wasm32-wasi/release/wasmedge-ggml-llama.wasm default 'Once upon a time, '
 ```
 
 After executing the command, it takes some time to wait for the output.

diff --git a/wasmedge-ggml-llama/src/main.rs b/wasmedge-ggml-llama/src/main.rs
@@ -7,7 +7,7 @@ fn main() {
     let prompt: &str = &args[2];
 
     let graph =
-        wasi_nn::GraphBuilder::new(wasi_nn::GraphEncoding::Ggml, wasi_nn::ExecutionTarget::CPU)
+        wasi_nn::GraphBuilder::new(wasi_nn::GraphEncoding::Ggml, wasi_nn::ExecutionTarget::AUTO)
             .build_from_cache(model_name)
             .unwrap();
     println!("Loaded model into wasi-nn with ID: {:?}", graph);

diff --git a/wasmedge-ggml-llama/wasmedge-ggml-llama.wasm b/wasmedge-ggml-llama/wasmedge-ggml-llama.wasm