Skip to content

Commit

Permalink
[Example] Basic example of WASI-NN whisper backend. (#147)
Browse files Browse the repository at this point in the history
Signed-off-by: YiYing He <[email protected]>
  • Loading branch information
q82419 authored Jul 4, 2024
1 parent da18b35 commit 616562d
Show file tree
Hide file tree
Showing 5 changed files with 120 additions and 0 deletions.
7 changes: 7 additions & 0 deletions whisper-basic/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[package]
name = "whisper-basic"
version = "0.1.0"
edition = "2021"

[dependencies]
wasmedge-wasi-nn = "0.8.0"
78 changes: 78 additions & 0 deletions whisper-basic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Basic Example For WASI-NN with Whisper Backend

This example is for a basic audio recognition with WASI-NN whisper backend in WasmEdge.
In current status, WasmEdge implement the Whisper backend of WASI-NN in only English. We'll extend more options in the future.

## Dependencies

This crate depends on the `wasmedge-wasi-nn` in the `Cargo.toml`:

```toml
[dependencies]
wasmedge-wasi-nn = "0.8.0"
```

## Build

Compile the application to WebAssembly:

```bash
cargo build --target=wasm32-wasi --release
```

The output WASM file will be at [`target/wasm32-wasi/release/whisper-basic.wasm`](whisper-basic.wasm).
To speed up the processing, we can enable the AOT mode in WasmEdge with:

```bash
wasmedge compile target/wasm32-wasi/release/whisper-basic.wasm whisper-basic_aot.wasm
```

## Run

### Test data

The testing audio is located at `./test.wav`.

Users should get the model by the guide from [whisper.cpp repository](https://github.com/ggerganov/whisper.cpp/tree/master/models):

```bash
curl -sSf https://raw.githubusercontent.com/ggerganov/whisper.cpp/master/models/download-ggml-model.sh | bash -s -- base.en
```

The model will be stored at `./ggml-base.en.bin`.

### Input Audio

The WASI-NN whisper backend for WasmEdge currently supported 16kHz, 1 channel, and `pcm_s16le` format.

Users can convert their input audio as following `ffmpeg` command:

```bash
ffmpeg -i test.m4a -acodec pcm_s16le -ac 1 -ar 16000 test.wav
```

### Execute

> Note: This is prepared for `0.14.2` or later release in the future. Please build from source now.
Users should [install the WasmEdge with WASI-NN plug-in in Whisper backend](https://wasmedge.org/docs/start/install/#wasi-nn-plug-ins).

```bash
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-whisper
```

Execute the WASM with the `wasmedge` with WASI-NN plug-in:

```bash
wasmedge --dir .:. whisper-basic_aot.wasm ggml-base.en.bin test.wav
```

You will get recognized string from the audio file in the output:

```bash
Read model, size in bytes: 147964211
Loaded graph into wasi-nn with ID: Graph#0
Read input tensor, size in bytes: 141408
Recognized from audio:
[00:00:00.000 --> 00:00:04.300] This is a test record for whisper.cpp
```
35 changes: 35 additions & 0 deletions whisper-basic/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
use std::env;
use std::fs;
use std::error::Error;
use wasmedge_wasi_nn::{GraphBuilder, GraphEncoding, ExecutionTarget, TensorType};

pub fn main() -> Result<(), Box<dyn Error>> {
let args: Vec<String> = env::args().collect();
let model_bin_name: &str = &args[1];
let wav_name: &str = &args[2];

let model_bin = fs::read(model_bin_name)?;
println!("Read model, size in bytes: {}", model_bin.len());

let graph = GraphBuilder::new(GraphEncoding::Whisper, ExecutionTarget::CPU).build_from_bytes(&[&model_bin])?;
let mut ctx = graph.init_execution_context()?;
println!("Loaded graph into wasi-nn with ID: {}", graph);

// Load the raw pcm tensor.
let wav_buf = fs::read(wav_name)?;
println!("Read input tensor, size in bytes: {}", wav_buf.len());

// Set input.
ctx.set_input(0, TensorType::F32, &[1, wav_buf.len()], &wav_buf)?;

// Execute the inference.
ctx.compute()?;

// Retrieve the output.
let mut output_buffer = vec![0u8; 2048];
_ = ctx.get_output(0, &mut output_buffer)?;

println!("Recognized from audio: \n{}", String::from_utf8(output_buffer).unwrap());

Ok(())
}
Binary file added whisper-basic/test.wav
Binary file not shown.
Binary file added whisper-basic/whisper-basic.wasm
Binary file not shown.

0 comments on commit 616562d

Please sign in to comment.