[feat] New example CodeLlama-13B-Instruct (#54)

* feat: new example `codellama-13b-instruct` Signed-off-by: Xin Liu <[email protected]> * chore(codellama): update `wasmedge-ggml-codellama.wasm` Signed-off-by: Xin Liu <[email protected]> * chore(codellama): update dependency Signed-off-by: Xin Liu <[email protected]> --------- Signed-off-by: Xin Liu <[email protected]>
second-state · Oct 26, 2023 · 39879bd · 39879bd
1 parent 0691525
commit 39879bd
Show file tree

Hide file tree

Showing 4 changed files with 340 additions and 0 deletions.
diff --git a/wasmedge-ggml-codellama/Cargo.toml b/wasmedge-ggml-codellama/Cargo.toml
@@ -0,0 +1,11 @@
+[package]
+name = "wasmedge-ggml-codellama"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+chat-prompts = "0.1"
+endpoints = "0.1"
+wasi-nn = { git = "https://github.com/second-state/wasmedge-wasi-nn", branch = "ggml" }
+clap = "4.4.6"
+once_cell = "1.18"
diff --git a/wasmedge-ggml-codellama/README.md b/wasmedge-ggml-codellama/README.md
@@ -0,0 +1,174 @@
+# Chat with `CodeLlama-13B-Instruct` using WASI-NN with GGML Backend
+
+## Requirement
+
+### For macOS (apple silicon)
+
+Install WasmEdge 0.13.4+WASI-NN ggml plugin(Metal enabled on apple silicon) via installer
+
+```bash
+curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
+# After install the wasmedge, you have to activate the environment.
+# Assuming you use zsh (the default shell on macOS), you will need to run the following command
+source $HOME/.zshenv
+```
+
+### For Ubuntu (>= 20.04)
+
+Because we enabled OpenBLAS on Ubuntu, you must install `libopenblas-dev` by `apt update && apt install -y libopenblas-dev`.
+
+Install WasmEdge 0.13.4+WASI-NN ggml plugin(OpenBLAS enabled) via installer
+
+```bash
+apt update && apt install -y libopenblas-dev # You may need sudo if the user is not root.
+curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
+# After install the wasmedge, you have to activate the environment.
+# Assuming you use bash (the default shell on Ubuntu), you will need to run the following command
+source $HOME/.bashrc
+```
+
+### For General Linux
+
+Install WasmEdge 0.13.4+WASI-NN ggml plugin via installer
+
+```bash
+curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
+# After install the wasmedge, you have to activate the environment.
+# Assuming you use bash (the default shell on Ubuntu), you will need to run the following command
+source $HOME/.bashrc
+```
+
+## Prepare WASM application
+
+### (Recommend) Use the pre-built one bundled in this repo
+
+We built a wasm of this example under the folder, check `wasmedge-ggml-codellama.wasm`
+
+### (Optional) Build from source
+
+If you want to do some modifications, you can build from source.
+
+Compile the application to WebAssembly:
+
+```bash
+cargo build --target wasm32-wasi --release
+```
+
+The output WASM file will be at `target/wasm32-wasi/release/`.
+
+```bash
+cp target/wasm32-wasi/release/wasmedge-ggml-codellama.wasm ./wasmedge-ggml-codellama.wasm
+```
+
+## Get Model
+
+In this example, we are going to use `codellama-13b-instruct.Q4_0.gguf`.
+
+Download llama model:
+
+```bash
+curl -LO https://huggingface.co/TheBloke/CodeLlama-13B-Instruct-GGUF/resolve/main/codellama-13b-instruct.Q4_0.gguf
+```
+
+## Execute
+
+Execute the WASM with the `wasmedge` using the named model feature to preload large model:
+
+```bash
+wasmedge --dir .:. \
+  --nn-preload default:GGML:CPU:codellama-13b-instruct.Q4_0.gguf \
+  wasmedge-ggml-codellama.wasm --model-alias default
+```
+
+After executing the command, you may need to wait a moment for the input prompt to appear.
+You can enter your question once you see the `[USER]:` prompt:
+
+~~~console
+[USER]:
+convert a String into a std::ffi::CString in Rust
+[ASSISTANT]:
+In Rust, you can convert a `String` into a `std::ffi::CString` using the `to_cstring` method. Here's an example:
+```
+use std::ffi::CString;
+
+let s = "Hello, world!";
+let c_string = s.to_cstring();
+```
+This will create a `CString` from the `String` `s` and store it in the `c_string` variable.
+
+Alternatively, you can use the `CString::new` method to create a `CString` from a `String` directly:
+```
+use std::ffi::CString;
+
+let s = "Hello, world!";
+let c_string = CString::new(s);
+```
+This will create a `CString` from the `String` `s` and store it in the `c_string` variable.
+
+Note that the `to_cstring` method and the `CString::new` method both return a `Result` type, which indicates whether the conversion was successful or not. If the conversion fails, the `Result` will contain an error message.
+[USER]:
+write a hello-world program in Python
+[ASSISTANT]:
+Sure! Here is a simple "Hello, World!" program in Python:
+```
+print("Hello, World!")
+```
+This program will print the string "Hello, World!" to the console.
+
+Alternatively, you can also use the `print()` function with parentheses to print the string:
+```
+print("Hello, World!")
+```
+This will also print the string "Hello, World!" to the console.
+
+I hope this helps! Let me know if you have any questions.
+~~~
+
+## Errors
+
+- After running `apt update && apt install -y libopenblas-dev`, you may encountered the following error:
+
+  ```bash
+  ...
+  E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
+  E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?
+  ```
+
+   This indicates that you are not logged in as `root`. Please try installing again using the `sudo` command:
+
+  ```bash
+  sudo apt update && sudo apt install -y libopenblas-dev
+  ```
+
+- After running the `wasmedge` command, you may received the following error:
+
+  ```bash
+  [2023-10-02 14:30:31.227] [error] loading failed: invalid path, Code: 0x20
+  [2023-10-02 14:30:31.227] [error]     load library failed:libblas.so.3: cannot open shared object file: No such file or directory
+  [2023-10-02 14:30:31.227] [error] loading failed: invalid path, Code: 0x20
+  [2023-10-02 14:30:31.227] [error]     load library failed:libblas.so.3: cannot open shared object file: No such file or directory
+  unknown option: nn-preload
+  ```
+
+  This suggests that your plugin installation was not successful. To resolve this issue, please attempt to install your desired plugin again.
+
+## Parameters
+
+Currently, we support the following parameters:
+
+- `LLAMA_LOG`: Set it to a non-empty value to enable logging.
+- `LLAMA_N_CTX`: Set the context size, the same as the `--ctx-size` parameter in llama.cpp (default: 512).
+- `LLAMA_N_PREDICT`: Set the number of tokens to predict, the same as the `--n-predict` parameter in llama.cpp (default: 512).
+
+These parameters can be set by adding the following environment variables before the `wasmedge` command:
+
+```bash
+LLAMA_LOG=1 LLAMA_N_CTX=2048 LLAMA_N_PREDICT=512 \
+wasmedge --dir .:. \
+  --nn-preload default:GGML:CPU:codellama-13b-instruct.Q4_0.gguf \
+  wasmedge-ggml-codellama.wasm --model-alias default --ctx-size 2048
+```
+
+## Credit
+
+The WASI-NN ggml plugin embedded [`llama.cpp`](git://github.com/ggerganov/llama.cpp.git@b1217) as its backend.
diff --git a/wasmedge-ggml-codellama/src/main.rs b/wasmedge-ggml-codellama/src/main.rs
@@ -0,0 +1,155 @@
+use chat_prompts::chat::{llama::CodeLlamaInstructPrompt, BuildChatPrompt, ChatPrompt};
+use clap::{Arg, Command};
+use endpoints::chat::{ChatCompletionRequest, ChatCompletionRequestMessage, ChatCompletionRole};
+use once_cell::sync::OnceCell;
+
+const DEFAULT_CTX_SIZE: &str = "2048";
+static CTX_SIZE: OnceCell<usize> = OnceCell::new();
+
+#[allow(unreachable_code)]
+fn main() -> Result<(), String> {
+    let matches = Command::new("Llama API Server")
+        .arg(
+            Arg::new("model_alias")
+                .short('m')
+                .long("model-alias")
+                .value_name("ALIAS")
+                .help("Sets the model alias")
+                .required(true),
+        )
+        .arg(
+            Arg::new("ctx_size")
+                .short('c')
+                .long("ctx-size")
+                .value_parser(clap::value_parser!(u32))
+                .value_name("CTX_SIZE")
+                .help("Sets the prompt context size")
+                .default_value(DEFAULT_CTX_SIZE),
+        )
+        .get_matches();
+
+    // model alias
+    let model_name = matches
+        .get_one::<String>("model_alias")
+        .unwrap()
+        .to_string();
+    println!("[INFO] Model alias: {alias}", alias = &model_name);
+
+    // prompt context size
+    let ctx_size = matches.get_one::<u32>("ctx_size").unwrap();
+    if CTX_SIZE.set(*ctx_size as usize).is_err() {
+        return Err(String::from("Fail to parse prompt context size"));
+    }
+    println!("[INFO] Prompt context size: {size}", size = ctx_size);
+
+    let template = ChatPrompt::CodeLlamaInstructPrompt(CodeLlamaInstructPrompt::default());
+
+    let mut chat_request = ChatCompletionRequest::default();
+
+    // load the model into wasi-nn
+    let graph = match wasi_nn::GraphBuilder::new(
+        wasi_nn::GraphEncoding::Ggml,
+        wasi_nn::ExecutionTarget::CPU,
+    )
+    .build_from_cache(model_name.as_ref())
+    {
+        Ok(graph) => graph,
+        Err(e) => {
+            return Err(format!(
+                "Fail to load model into wasi-nn: {msg}",
+                msg = e.to_string()
+            ))
+        }
+    };
+
+    // initialize the execution context
+    let mut context = match graph.init_execution_context() {
+        Ok(context) => context,
+        Err(e) => {
+            return Err(format!(
+                "Fail to create wasi-nn execution context: {msg}",
+                msg = e.to_string()
+            ))
+        }
+    };
+
+    print_separator();
+
+    loop {
+        println!("[USER]:");
+        let user_message = read_input();
+        chat_request
+            .messages
+            .push(ChatCompletionRequestMessage::new(
+                ChatCompletionRole::User,
+                user_message,
+            ));
+
+        // build prompt
+        let prompt = match template.build(&mut chat_request.messages) {
+            Ok(prompt) => prompt,
+            Err(e) => {
+                return Err(format!(
+                    "Fail to build chat prompts: {msg}",
+                    msg = e.to_string()
+                ))
+            }
+        };
+
+        // read input tensor
+        let tensor_data = prompt.as_bytes().to_vec();
+        if context
+            .set_input(0, wasi_nn::TensorType::U8, &[1], &tensor_data)
+            .is_err()
+        {
+            return Err(String::from("Fail to set input tensor"));
+        };
+
+        // execute the inference
+        if context.compute().is_err() {
+            return Err(String::from("Fail to execute model inference"));
+        }
+
+        // retrieve the output
+        let mut output_buffer = vec![0u8; *CTX_SIZE.get().unwrap()];
+        let mut output_size = match context.get_output(0, &mut output_buffer) {
+            Ok(size) => size,
+            Err(e) => {
+                return Err(format!(
+                    "Fail to get output tensor: {msg}",
+                    msg = e.to_string()
+                ))
+            }
+        };
+        output_size = std::cmp::min(*CTX_SIZE.get().unwrap(), output_size);
+        let output = String::from_utf8_lossy(&output_buffer[..output_size]).to_string();
+        println!("[ASSISTANT]:\n{}", output.trim());
+
+        // put the answer into the `messages` of chat_request
+        chat_request
+            .messages
+            .push(ChatCompletionRequestMessage::new(
+                ChatCompletionRole::Assistant,
+                output,
+            ));
+    }
+
+    Ok(())
+}
+
+fn read_input() -> String {
+    loop {
+        let mut answer = String::new();
+        std::io::stdin()
+            .read_line(&mut answer)
+            .ok()
+            .expect("Failed to read line");
+        if !answer.is_empty() && answer != "\n" && answer != "\r\n" {
+            return answer;
+        }
+    }
+}
+
+fn print_separator() {
+    println!("---------------------------------------");
+}
diff --git a/wasmedge-ggml-codellama/wasmedge-ggml-codellama.wasm b/wasmedge-ggml-codellama/wasmedge-ggml-codellama.wasm