Skip to content

Commit

Permalink
Create useful-moonshine-onnx package; move demos
Browse files Browse the repository at this point in the history
This refactor separates the code for running Moonshine w/ the ONNX runtime from the Moonshine w/ Keras API package. Accordingly, the demos are promoted to the top-level of the repo and all relevant documentation/code examples have been refactored.
  • Loading branch information
evmaki committed Nov 23, 2024
1 parent 3f81139 commit 20c60f1
Show file tree
Hide file tree
Showing 30 changed files with 100,511 additions and 161 deletions.
41 changes: 23 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,10 @@ This repo hosts inference code and demos for Moonshine.

- [Installation](#installation)
- [1. Create a virtual environment](#1-create-a-virtual-environment)
- [2. Install the Moonshine package](#2-install-the-moonshine-package)
- [2a. Install the `useful-moonshine` package to use Moonshine with Torch, TensorFlow, or JAX](#2a-install-the-useful-moonshine-package-to-use-moonshine-with-torch-tensorflow-or-jax)
- [2b. Install the `useful-moonshine-onnx` package to use Moonshine with ONNX](#2b-install-the-useful-moonshine-onnx-package-to-use-moonshine-with-onnx)
- [3. Try it out](#3-try-it-out)
- [Examples](#examples)
- [Onnx Standalone](#onnx-standalone)
- [Live Captions](#live-captions)
- [Running in the Browser](#running-in-the-browser)
- [CTranslate2](#ctranslate2)
Expand All @@ -61,7 +61,14 @@ This repo hosts inference code and demos for Moonshine.

## Installation

We like `uv` for managing Python environments, so we use it here. If you don't want to use it, simply skip the first step and leave `uv` off of your shell commands.
We currently offer two options for installing Moonshine:

1. `useful-moonshine`, which uses Keras (with support for Torch, TensorFlow, and JAX backends)
2. `useful-moonshine-onnx`, which uses the ONNX runtime

These instructions apply to both options; follow along to get started.

Note: We like `uv` for managing Python environments, so we use it here. If you don't want to use it, simply skip the `uv` installation and leave `uv` off of your shell commands.

### 1. Create a virtual environment

Expand All @@ -74,9 +81,9 @@ uv venv env_moonshine
source env_moonshine/bin/activate
```

### 2. Install the Moonshine package
### 2a. Install the `useful-moonshine` package to use Moonshine with Torch, TensorFlow, or JAX

The `moonshine` inference code is written in Keras and can run with each of the backends that Keras supports: Torch, TensorFlow, and JAX. The backend you choose will determine which flavor of the `moonshine` package to install. If you're just getting started, we suggest installing the (default) Torch backend:
The `useful-moonshine` inference code is written in Keras and can run with each of the backends that Keras supports: Torch, TensorFlow, and JAX. The backend you choose will determine which flavor of the `useful-moonshine` package to install. If you're just getting started, we suggest installing the (default) Torch backend:

```shell
uv pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git
Expand All @@ -103,41 +110,39 @@ export KERAS_BACKEND=jax
# Use useful-moonshine[jax-cuda] for jax on GPU
```

To run with ONNX runtime that is supported on platforms, run the following:
### 2b. Install the `useful-moonshine-onnx` package to use Moonshine with ONNX

Using Moonshine with the ONNX runtime is preferable if you want to run the models on SBCs like the Raspberry Pi. We've prepared a separate version of
the package with minimal dependencies to support these use cases. To use it, run the following:

```shell
uv pip install useful-moonshine[onnx]@git+https://github.com/usefulsensors/moonshine.git
uv pip install useful-moonshine-onnx @ git+https://git@github.com/usefulsensors/moonshine.git#subdirectory=moonshine-onnx
```

### 3. Try it out

You can test Moonshine by transcribing the provided example audio file with the `.transcribe` function:
You can test whichever type of Moonshine you installed by transcribing the provided example audio file with the `.transcribe` function:

```shell
python
>>> import moonshine
>>> moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny')
>>> import moonshine # or import moonshine_onnx
>>> moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny') # or moonshine_onnx.transcribe(...)
['Ever tried ever failed, no matter try again, fail again, fail better.']
```

The first argument is a path to an audio file and the second is the name of a Moonshine model. `moonshine/tiny` and `moonshine/base` are the currently available models.
Use the `moonshine.transcribe_with_onnx` function to use the ONNX runtime for inference. The parameters are the same as they are for `moonshine.transcribe`.

## Examples

The Moonshine models can be used with a variety of different runtimes and applications, so we've included code samples showing how to use them in different situations. The [`moonshine/demo`](/moonshine/demo/) folder in this repository also has more information on many of them.

### Onnx Standalone

The latest versions of the Onnx Moonshine models are available on HuggingFace at [huggingface.co/UsefulSensors/moonshine/tree/main/onnx](https://huggingface.co/UsefulSensors/moonshine/tree/main/onnx). You can find [an example Python script](/moonshine/demo/onnx_standalone.py) and more information about running them [in the demo folder](/moonshine/demo/README.md#demo-standalone-file-transcription-with-onnx).
Since the Moonshine models can be used with a variety of different runtimes and applications, we've included code samples showing how to use them in different situations. The [`demo`](/demo/) folder in this repository also has more information on many of them.

### Live Captions

You can try the Moonshine models with live input from a microphone on many platforms with the [live captions demo](/moonshine/demo/README.md#demo-live-captioning-from-microphone-input).
You can try the Moonshine ONNX models with live input from a microphone with the [live captions demo](/demo/README.md#demo-live-captioning-from-microphone-input).

### Running in the Browser

You can try out the Moonshine models on your device in a web browser with our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web). We've included the [source for this demo](/moonshine/demo/moonshine-web/) in this repository; this is a great starting place for those wishing to build web-based applications with Moonshine.
You can try out the Moonshine ONNX models locally in a web browser with our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web). We've included the [source for this demo](/demo/moonshine-web/) in this repository; this is a great starting place for those wishing to build web-based applications with Moonshine.

### CTranslate2

Expand Down
65 changes: 31 additions & 34 deletions moonshine/demo/README.md → demo/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
# Moonshine Demos

This directory contains various scripts to demonstrate the capabilities of the
This directory contains scripts to demonstrate the capabilities of the
Moonshine ASR models.

- [Moonshine Demos](#moonshine-demos)
- [Demo: Moonshine running in the browser with ONNX](#demo-moonshine-running-in-the-browser-with-onnx)
- [Demo: Standalone file transcription with ONNX](#demo-standalone-file-transcription-with-onnx)
- [Demo: Running in the browser](#demo-running-in-the-browser)
- [Demo: Live captioning from microphone input](#demo-live-captioning-from-microphone-input)
- [Installation.](#installation)
- [0. Setup environment](#0-setup-environment)
Expand All @@ -17,46 +16,45 @@ Moonshine ASR models.
- [Metrics](#metrics)
- [Citation](#citation)

# Demo: Moonshine running in the browser with ONNX
# Demo: Running in the browser

The Node.js project in [`moonshine-web`](/moonshine/demo/moonshine-web/) demonstrates how to run the
Moonshine models in the web browser using `onnxruntime-web`. You can try this demo on your own device using our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web) without having to run the project from the source here. Of note, the [`moonshine.js`](/moonshine/demo/moonshine-web/src/moonshine.js) script contains everything you need to perform inferences with the Moonshine ONNX models in the browser. If you would like to build on the web demo, follow the instructions in the demo directory to get started.
The Node.js project in [`moonshine-web`](/demo/moonshine-web/) demonstrates how to run the
Moonshine models in the web browser using `onnxruntime-web`. You can try this demo on your own device using our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web) without having to run the project from the source here. Of note, the [`moonshine.js`](/demo/moonshine-web/src/moonshine.js) script contains everything you need to perform inferences with the Moonshine ONNX models in the browser. If you would like to build on the web demo, follow these instructions to get started.

# Demo: Standalone file transcription with ONNX
## Installation

You must have Node.js (or another JavaScript toolkit like [Bun](https://bun.sh/)) installed to get started. Install [Node.js](https://nodejs.org/en) if you don't have it already.

The script [`onnx_standalone.py`](/moonshine/demo/onnx_standalone.py)
demonstrates how to run a Moonshine model with the `onnxruntime`
package alone, without depending on `torch` or `tensorflow`. This enables
running on SBCs such as Raspberry Pi. Follow the instructions below to setup
and run.
Once you have your JavaScript toolkit installed, clone the `moonshine` repo and navigate to this directory:

1. Install `onnxruntime` (or `onnxruntime-gpu` if you want to run on GPUs) and `tokenizers` packages using your Python package manager of choice, such as `pip`.
```shell
git clone [email protected]:usefulsensors/moonshine.git
cd moonshine/demo/moonshine-web
```

2. Download the `onnx` files from huggingface hub to a directory.
Then install the project's dependencies:

```shell
mkdir moonshine_base_onnx
cd moonshine_base_onnx
wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/preprocess.onnx
wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/encode.onnx
wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/uncached_decode.onnx
wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/cached_decode.onnx
cd ..
npm install
```

3. Run `onnx_standalone.py` to transcribe a wav file
The demo expects the Moonshine Tiny and Base ONNX models to be available in `public/moonshine/tiny` and `public/moonshine/base`, respectively. To preserve space, they are not included here. However, we've included a helper script that you can run to conveniently download them from HuggingFace:

```shell
moonshine/moonshine/demo/onnx_standalone.py --models_dir moonshine_base_onnx --wav_file moonshine/moonshine/assets/beckett.wav
['Ever tried ever failed, no matter try again fail again fail better.']
npm run get-models
```

This project uses Vite for bundling and development. Run the following to start a development server and open the demo in your web browser:

```shell
npm run dev
```

# Demo: Live captioning from microphone input

https://github.com/user-attachments/assets/aa65ef54-d4ac-4d31-864f-222b0e6ccbd3

This folder contains a demo of live captioning from microphone input, built on Moonshine. The script runs the Moonshine ONNX model on segments of speech detected in the microphone signal using a voice activity detector called [`silero-vad`](https://github.com/snakers4/silero-vad). The script prints scrolling text or "live captions" assembled from the model predictions to the console.
The [`moonshine-onnx/live_captions.py`](/demo/moonshine-onnx/live_captions.py) script contains a demo of live captioning from microphone input, built on Moonshine. The script runs the Moonshine ONNX model on segments of speech detected in the microphone signal using a voice activity detector called [`silero-vad`](https://github.com/snakers4/silero-vad). The script prints scrolling text or "live captions" assembled from the model predictions to the console.

The following steps have been tested in a `uv` (v0.4.25) virtual environment on these platforms:

Expand All @@ -68,7 +66,7 @@ The following steps have been tested in a `uv` (v0.4.25) virtual environment on

### 0. Setup environment

Steps to set up a virtual environment are available in the [top level README](/README.md) of this repo. Note that this demo is standalone and has no requirement to install the `useful-moonshine` package. Instead, you will clone the repo.
Steps to set up a virtual environment are available in the [top level README](/README.md) of this repo. After creating a virtual environment, do the following:

### 1. Clone the repo and install extra dependencies

Expand All @@ -81,11 +79,10 @@ git clone [email protected]:usefulsensors/moonshine.git
Then install the demo's requirements:

```shell
uv pip install -r moonshine/moonshine/demo/requirements.txt
uv pip install -r moonshine/demo/moonshine-onnx/requirements.txt
```

There is a dependency on `torch` because of `silero-vad` package. There is no
dependency on `tensorflow`.
Note that while `useful-moonshine-onnx` has no requirement for `torch`, this demo introduces a dependency for it because of the `silero-vad` package.

#### Ubuntu: Install PortAudio

Expand All @@ -102,7 +99,7 @@ sudo apt install -y portaudio19-dev
First, check that your microphone is connected and that the volume setting is not muted in your host OS or system audio drivers. Then, run the script:

``` shell
python3 moonshine/moonshine/demo/live_captions.py
python3 moonshine/demo/moonshine-onnx/live_captions.py
```

By default, this will run the demo with the Moonshine Base model using the ONNX runtime. The optional `--model_name` argument sets the model to use: supported arguments are `moonshine/base` and `moonshine/tiny`.
Expand All @@ -113,7 +110,7 @@ An example run on Ubuntu 24.04 VM on MacBook Pro M2 with Moonshine base ONNX
model:

```console
(env_moonshine_demo) parallels@ubuntu-linux-2404:~$ python3 moonshine/moonshine/demo/live_captions.py
(env_moonshine_demo) parallels@ubuntu-linux-2404:~$ python3 moonshine/demo/moonshine-onnx/live_captions.py
Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
Loading Moonshine model 'moonshine/base' (ONNX runtime) ...
Press Ctrl+C to quit live captions.
Expand All @@ -138,7 +135,7 @@ for a value of 0.2 seconds. Our Moonshine base model runs ~ 7x faster for this
example.

```console
(env_moonshine_faster_whisper) parallels@ubuntu-linux-2404:~$ python3 moonshine/moonshine/demo/live_captions.py
(env_moonshine_faster_whisper) parallels@ubuntu-linux-2404:~$ python3 moonshine/demo/moonshine-onnx/live_captions.py
Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
Loading Faster-Whisper float32 base.en model ...
Press Ctrl+C to quit live captions.
Expand All @@ -161,7 +158,7 @@ This is an example of the Faster Whisper float32 base model being used to genera

You may customize this script to display Moonshine text transcriptions as you wish.

The script `live_captions.py` loads the English language version of Moonshine base ONNX model. It includes logic to detect speech activity and limit the context window of speech fed to the Moonshine model. The returned transcriptions are displayed as scrolling captions. Speech segments with pauses are cached and these cached captions are printed on exit.
The script `moonshine-onnx/live_captions.py` loads the English language version of Moonshine base ONNX model. It includes logic to detect speech activity and limit the context window of speech fed to the Moonshine model. The returned transcriptions are displayed as scrolling captions. Speech segments with pauses are cached and these cached captions are printed on exit.

### Speech truncation and hallucination

Expand All @@ -172,7 +169,7 @@ Some hallucinations will be seen when the script is running: one reason is speec
If you run this script on a slower processor, consider using the `tiny` model.

```shell
python3 ./moonshine/moonshine/demo/live_captions.py --model_name moonshine/tiny
python3 ./moonshine/demo/moonshine-onnx/live_captions.py --model_name moonshine/tiny
```

The value of `MIN_REFRESH_SECS` will be ineffective when the model inference time exceeds that value. Conversely on a faster processor consider reducing the value of `MIN_REFRESH_SECS` for more frequent caption updates. On a slower processor you might also consider reducing the value of `MAX_SPEECH_SECS` to avoid slower model inferencing encountered with longer speech segments.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,14 @@

import argparse
import os
import sys
import time
from queue import Queue

import numpy as np
from silero_vad import VADIterator, load_silero_vad
from sounddevice import InputStream
from tokenizers import Tokenizer

# Local import of Moonshine ONNX model.
MOONSHINE_DEMO_DIR = os.path.dirname(__file__)
sys.path.append(os.path.join(MOONSHINE_DEMO_DIR, ".."))

from onnx_model import MoonshineOnnxModel
from moonshine_onnx import MoonshineOnnxModel, load_tokenizer

SAMPLING_RATE = 16000

Expand All @@ -34,10 +28,7 @@ def __init__(self, model_name, rate=16000):
raise ValueError("Moonshine supports sampling rate 16000 Hz.")
self.model = MoonshineOnnxModel(model_name=model_name)
self.rate = rate
tokenizer_path = os.path.join(
MOONSHINE_DEMO_DIR, "..", "assets", "tokenizer.json"
)
self.tokenizer = Tokenizer.from_file(tokenizer_path)
self.tokenizer = load_tokenizer()

self.inference_secs = 0
self.number_inferences = 0
Expand Down
3 changes: 3 additions & 0 deletions demo/moonshine-onnx/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
silero_vad
sounddevice
useful-moonshine-onnx @ git+https://[email protected]/usefulsensors/moonshine.git#subdirectory=moonshine-onnx
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
21 changes: 21 additions & 0 deletions moonshine-onnx/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Useful Sensors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
4 changes: 4 additions & 0 deletions moonshine-onnx/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include requirements.txt
include README.md
include LICENSE
include src/assets/*
3 changes: 3 additions & 0 deletions moonshine-onnx/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# useful-moonshine-onnx

Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. This package contains inference code for using Moonshine models with the ONNX runtime. For more information, please refer to the [project repo on GitHub](https://github.com/usefulsensors/moonshine).
4 changes: 4 additions & 0 deletions moonshine-onnx/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
tokenizers>=0.19.0
onnxruntime
huggingface_hub
librosa
32 changes: 32 additions & 0 deletions moonshine-onnx/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from pathlib import Path

import pkg_resources
from setuptools import setup


def read_version(fname="src/version.py"):
exec(compile(open(fname, encoding="utf-8").read(), fname, "exec"))
return locals()["__version__"]


setup(
name="useful-moonshine-onnx",
packages=["moonshine_onnx"],
package_dir={"moonshine_onnx": "src"},
version=read_version(),
description="Speech recognition for live transcription and voice commands with the Moonshine ONNX models.",
long_description=open("README.md", encoding="utf-8").read(),
long_description_content_type="text/markdown",
readme="README.md",
python_requires=">=3.8",
author="Useful Sensors",
url="https://github.com/usefulesensors/moonshine",
license="MIT",
install_requires=[
str(r)
for r in pkg_resources.parse_requirements(
Path(__file__).with_name("requirements.txt").open()
)
],
include_package_data=True,
)
12 changes: 12 additions & 0 deletions moonshine-onnx/src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from pathlib import Path
from .version import __version__

ASSETS_DIR = Path(__file__).parents[0] / "assets"

from .model import MoonshineOnnxModel
from .transcribe import (
transcribe,
benchmark,
load_tokenizer,
load_audio,
)
Binary file added moonshine-onnx/src/assets/beckett.wav
Binary file not shown.
Loading

0 comments on commit 20c60f1

Please sign in to comment.