Create useful-moonshine-onnx package; move demos

This refactor separates the code for running Moonshine w/ the ONNX runtime from the Moonshine w/ Keras API package. Accordingly, the demos are promoted to the top-level of the repo and all relevant documentation/code examples have been refactored.
usefulsensors · Nov 23, 2024 · 20c60f1 · 20c60f1
1 parent 3f81139
commit 20c60f1
Show file tree

Hide file tree

Showing 30 changed files with 100,511 additions and 161 deletions.
diff --git a/README.md b/README.md
@@ -48,10 +48,10 @@ This repo hosts inference code and demos for Moonshine.
 
 - [Installation](#installation)
   - [1. Create a virtual environment](#1-create-a-virtual-environment)
-  - [2. Install the Moonshine package](#2-install-the-moonshine-package)
+  - [2a. Install the `useful-moonshine` package to use Moonshine with Torch, TensorFlow, or JAX](#2a-install-the-useful-moonshine-package-to-use-moonshine-with-torch-tensorflow-or-jax)
+  - [2b. Install the `useful-moonshine-onnx` package to use Moonshine with ONNX](#2b-install-the-useful-moonshine-onnx-package-to-use-moonshine-with-onnx)
   - [3. Try it out](#3-try-it-out)
 - [Examples](#examples)
-  - [Onnx Standalone](#onnx-standalone)
   - [Live Captions](#live-captions)
   - [Running in the Browser](#running-in-the-browser)
   - [CTranslate2](#ctranslate2)
@@ -61,7 +61,14 @@ This repo hosts inference code and demos for Moonshine.
 
 ## Installation
 
-We like `uv` for managing Python environments, so we use it here. If you don't want to use it, simply skip the first step and leave `uv` off of your shell commands.
+We currently offer two options for installing Moonshine:
+
+1. `useful-moonshine`, which uses Keras (with support for Torch, TensorFlow, and JAX backends)
+2. `useful-moonshine-onnx`, which uses the ONNX runtime
+
+These instructions apply to both options; follow along to get started.
+
+Note: We like `uv` for managing Python environments, so we use it here. If you don't want to use it, simply skip the `uv` installation and leave `uv` off of your shell commands.
 
 ### 1. Create a virtual environment
 
@@ -74,9 +81,9 @@ uv venv env_moonshine
 source env_moonshine/bin/activate
 ```
 
-### 2. Install the Moonshine package
+### 2a. Install the `useful-moonshine` package to use Moonshine with Torch, TensorFlow, or JAX
 
-The `moonshine` inference code is written in Keras and can run with each of the backends that Keras supports: Torch, TensorFlow, and JAX. The backend you choose will determine which flavor of the `moonshine` package to install. If you're just getting started, we suggest installing the (default) Torch backend:
+The `useful-moonshine` inference code is written in Keras and can run with each of the backends that Keras supports: Torch, TensorFlow, and JAX. The backend you choose will determine which flavor of the `useful-moonshine` package to install. If you're just getting started, we suggest installing the (default) Torch backend:
 
 ```shell
 uv pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git
@@ -103,41 +110,39 @@ export KERAS_BACKEND=jax
 # Use useful-moonshine[jax-cuda] for jax on GPU
 ```
 
-To run with ONNX runtime that is supported on platforms, run the following:
+### 2b. Install the `useful-moonshine-onnx` package to use Moonshine with ONNX
+
+Using Moonshine with the ONNX runtime is preferable if you want to run the models on SBCs like the Raspberry Pi. We've prepared a separate version of
+the package with minimal dependencies to support these use cases. To use it, run the following:
 
 ```shell
-uv pip install useful-moonshine[onnx]@git+https://github.com/usefulsensors/moonshine.git
+uv pip install useful-moonshine-onnx @ git+https://git@github.com/usefulsensors/moonshine.git#subdirectory=moonshine-onnx
 ```
 
 ### 3. Try it out
 
-You can test Moonshine by transcribing the provided example audio file with the `.transcribe` function:
+You can test whichever type of Moonshine you installed by transcribing the provided example audio file with the `.transcribe` function:
 
 ```shell
 python
->>> import moonshine
->>> moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny')
+>>> import moonshine # or import moonshine_onnx
+>>> moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny') # or moonshine_onnx.transcribe(...)
 ['Ever tried ever failed, no matter try again, fail again, fail better.']
 ```
 
 The first argument is a path to an audio file and the second is the name of a Moonshine model. `moonshine/tiny` and `moonshine/base` are the currently available models.
-Use the `moonshine.transcribe_with_onnx` function to use the ONNX runtime for inference. The parameters are the same as they are for `moonshine.transcribe`.
 
 ## Examples
 
-The Moonshine models can be used with a variety of different runtimes and applications, so we've included code samples showing how to use them in different situations. The [`moonshine/demo`](/moonshine/demo/) folder in this repository also has more information on many of them.
-
-### Onnx Standalone
-
-The latest versions of the Onnx Moonshine models are available on HuggingFace at [huggingface.co/UsefulSensors/moonshine/tree/main/onnx](https://huggingface.co/UsefulSensors/moonshine/tree/main/onnx). You can find [an example Python script](/moonshine/demo/onnx_standalone.py) and more information about running them [in the demo folder](/moonshine/demo/README.md#demo-standalone-file-transcription-with-onnx).
+Since the Moonshine models can be used with a variety of different runtimes and applications, we've included code samples showing how to use them in different situations. The [`demo`](/demo/) folder in this repository also has more information on many of them.
 
 ### Live Captions
 
-You can try the Moonshine models with live input from a microphone on many platforms with the [live captions demo](/moonshine/demo/README.md#demo-live-captioning-from-microphone-input).
+You can try the Moonshine ONNX models with live input from a microphone with the [live captions demo](/demo/README.md#demo-live-captioning-from-microphone-input).
 
 ### Running in the Browser
 
-You can try out the Moonshine models on your device in a web browser with our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web). We've included the [source for this demo](/moonshine/demo/moonshine-web/) in this repository; this is a great starting place for those wishing to build web-based applications with Moonshine.
+You can try out the Moonshine ONNX models locally in a web browser with our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web). We've included the [source for this demo](/demo/moonshine-web/) in this repository; this is a great starting place for those wishing to build web-based applications with Moonshine.
 
 ### CTranslate2
 

diff --git a/moonshine/demo/README.md → demo/README.md b/moonshine/demo/README.md → demo/README.md
@@ -1,11 +1,10 @@
 # Moonshine Demos
 
-This directory contains various scripts to demonstrate the capabilities of the
+This directory contains scripts to demonstrate the capabilities of the
 Moonshine ASR models.
 
 - [Moonshine Demos](#moonshine-demos)
-- [Demo: Moonshine running in the browser with ONNX](#demo-moonshine-running-in-the-browser-with-onnx)
-- [Demo: Standalone file transcription with ONNX](#demo-standalone-file-transcription-with-onnx)
+- [Demo: Running in the browser](#demo-running-in-the-browser)
 - [Demo: Live captioning from microphone input](#demo-live-captioning-from-microphone-input)
   - [Installation.](#installation)
     - [0. Setup environment](#0-setup-environment)
@@ -17,46 +16,45 @@ Moonshine ASR models.
     - [Metrics](#metrics)
 - [Citation](#citation)
 
-# Demo: Moonshine running in the browser with ONNX
+# Demo: Running in the browser
 
-The Node.js project in [`moonshine-web`](/moonshine/demo/moonshine-web/) demonstrates how to run the
-Moonshine models in the web browser using `onnxruntime-web`. You can try this demo on your own device using our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web) without having to run the project from the source here. Of note, the [`moonshine.js`](/moonshine/demo/moonshine-web/src/moonshine.js) script contains everything you need to perform inferences with the Moonshine ONNX models in the browser. If you would like to build on the web demo, follow the instructions in the demo directory to get started.
+The Node.js project in [`moonshine-web`](/demo/moonshine-web/) demonstrates how to run the
+Moonshine models in the web browser using `onnxruntime-web`. You can try this demo on your own device using our [HuggingFace space](https://huggingface.co/spaces/UsefulSensors/moonshine-web) without having to run the project from the source here. Of note, the [`moonshine.js`](/demo/moonshine-web/src/moonshine.js) script contains everything you need to perform inferences with the Moonshine ONNX models in the browser. If you would like to build on the web demo, follow these instructions to get started.
 
-# Demo: Standalone file transcription with ONNX
+## Installation
+
+You must have Node.js (or another JavaScript toolkit like [Bun](https://bun.sh/)) installed to get started. Install [Node.js](https://nodejs.org/en) if you don't have it already.
 
-The script [`onnx_standalone.py`](/moonshine/demo/onnx_standalone.py)
-demonstrates how to run a Moonshine model with the `onnxruntime`
-package alone, without depending on `torch` or `tensorflow`. This enables
-running on SBCs such as Raspberry Pi. Follow the instructions below to setup
-and run.
+Once you have your JavaScript toolkit installed, clone the `moonshine` repo and navigate to this directory:
 
-1. Install `onnxruntime` (or `onnxruntime-gpu` if you want to run on GPUs) and `tokenizers` packages using your Python package manager of choice, such as `pip`.
+```shell
+git clone [email protected]:usefulsensors/moonshine.git
+cd moonshine/demo/moonshine-web
+```
 
-2. Download the `onnx` files from huggingface hub to a directory.
+Then install the project's dependencies:
 
 ```shell
-mkdir moonshine_base_onnx
-cd moonshine_base_onnx
-wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/preprocess.onnx
-wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/encode.onnx
-wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/uncached_decode.onnx
-wget https://huggingface.co/UsefulSensors/moonshine/resolve/main/onnx/base/cached_decode.onnx
-cd ..
+npm install
 ```
 
-3. Run `onnx_standalone.py` to transcribe a wav file
+The demo expects the Moonshine Tiny and Base ONNX models to be available in `public/moonshine/tiny` and `public/moonshine/base`, respectively. To preserve space, they are not included here. However, we've included a helper script that you can run to conveniently download them from HuggingFace:
 
 ```shell
-moonshine/moonshine/demo/onnx_standalone.py --models_dir moonshine_base_onnx --wav_file moonshine/moonshine/assets/beckett.wav
-['Ever tried ever failed, no matter try again fail again fail better.']
+npm run get-models
 ```
 
+This project uses Vite for bundling and development. Run the following to start a development server and open the demo in your web browser:
+
+```shell
+npm run dev
+```
 
 # Demo: Live captioning from microphone input
 
 https://github.com/user-attachments/assets/aa65ef54-d4ac-4d31-864f-222b0e6ccbd3
 
-This folder contains a demo of live captioning from microphone input, built on Moonshine. The script runs the Moonshine ONNX model on segments of speech detected in the microphone signal using a voice activity detector called [`silero-vad`](https://github.com/snakers4/silero-vad). The script prints scrolling text or "live captions" assembled from the model predictions to the console.
+The [`moonshine-onnx/live_captions.py`](/demo/moonshine-onnx/live_captions.py) script contains a demo of live captioning from microphone input, built on Moonshine. The script runs the Moonshine ONNX model on segments of speech detected in the microphone signal using a voice activity detector called [`silero-vad`](https://github.com/snakers4/silero-vad). The script prints scrolling text or "live captions" assembled from the model predictions to the console.
 
 The following steps have been tested in a `uv` (v0.4.25) virtual environment on these platforms:
 
@@ -68,7 +66,7 @@ The following steps have been tested in a `uv` (v0.4.25) virtual environment on
 
 ### 0. Setup environment
 
-Steps to set up a virtual environment are available in the [top level README](/README.md) of this repo. Note that this demo is standalone and has no requirement to install the `useful-moonshine` package. Instead, you will clone the repo.
+Steps to set up a virtual environment are available in the [top level README](/README.md) of this repo. After creating a virtual environment, do the following:
 
 ### 1. Clone the repo and install extra dependencies
 
@@ -81,11 +79,10 @@ git clone [email protected]:usefulsensors/moonshine.git
 Then install the demo's requirements:
 
 ```shell
-uv pip install -r moonshine/moonshine/demo/requirements.txt
+uv pip install -r moonshine/demo/moonshine-onnx/requirements.txt
 ```
 
-There is a dependency on `torch` because of `silero-vad` package.  There is no
-dependency on `tensorflow`.
+Note that while `useful-moonshine-onnx` has no requirement for `torch`, this demo introduces a dependency for it because of the `silero-vad` package.
 
 #### Ubuntu: Install PortAudio
 
@@ -102,7 +99,7 @@ sudo apt install -y portaudio19-dev
 First, check that your microphone is connected and that the volume setting is not muted in your host OS or system audio drivers. Then, run the script:
 
 ``` shell
-python3 moonshine/moonshine/demo/live_captions.py
+python3 moonshine/demo/moonshine-onnx/live_captions.py
 ```
 
 By default, this will run the demo with the Moonshine Base model using the ONNX runtime. The optional `--model_name` argument sets the model to use: supported arguments are `moonshine/base` and `moonshine/tiny`.
@@ -113,7 +110,7 @@ An example run on Ubuntu 24.04 VM on MacBook Pro M2 with Moonshine base ONNX
 model:
 
 ```console
-(env_moonshine_demo) parallels@ubuntu-linux-2404:~$ python3 moonshine/moonshine/demo/live_captions.py
+(env_moonshine_demo) parallels@ubuntu-linux-2404:~$ python3 moonshine/demo/moonshine-onnx/live_captions.py
 Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
 Loading Moonshine model 'moonshine/base' (ONNX runtime) ...
 Press Ctrl+C to quit live captions.
@@ -138,7 +135,7 @@ for a value of 0.2 seconds.  Our Moonshine base model runs ~ 7x faster for this
 example.
 
 ```console
-(env_moonshine_faster_whisper) parallels@ubuntu-linux-2404:~$ python3 moonshine/moonshine/demo/live_captions.py
+(env_moonshine_faster_whisper) parallels@ubuntu-linux-2404:~$ python3 moonshine/demo/moonshine-onnx/live_captions.py
 Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
 Loading Faster-Whisper float32 base.en model  ...
 Press Ctrl+C to quit live captions.
@@ -161,7 +158,7 @@ This is an example of the Faster Whisper float32 base model being used to genera
 
 You may customize this script to display Moonshine text transcriptions as you wish.
 
-The script `live_captions.py` loads the English language version of Moonshine base ONNX model. It includes logic to detect speech activity and limit the context window of speech fed to the Moonshine model. The returned transcriptions are displayed as scrolling captions. Speech segments with pauses are cached and these cached captions are printed on exit.
+The script `moonshine-onnx/live_captions.py` loads the English language version of Moonshine base ONNX model. It includes logic to detect speech activity and limit the context window of speech fed to the Moonshine model. The returned transcriptions are displayed as scrolling captions. Speech segments with pauses are cached and these cached captions are printed on exit.
 
 ### Speech truncation and hallucination
 
@@ -172,7 +169,7 @@ Some hallucinations will be seen when the script is running: one reason is speec
 If you run this script on a slower processor, consider using the `tiny` model.
 
 ```shell
-python3 ./moonshine/moonshine/demo/live_captions.py --model_name moonshine/tiny
+python3 ./moonshine/demo/moonshine-onnx/live_captions.py --model_name moonshine/tiny
 ```
 
 The value of `MIN_REFRESH_SECS` will be ineffective when the model inference time exceeds that value.  Conversely on a faster processor consider reducing the value of `MIN_REFRESH_SECS` for more frequent caption updates.  On a slower processor you might also consider reducing the value of `MAX_SPEECH_SECS` to avoid slower model inferencing encountered with longer speech segments.

diff --git a/moonshine/demo/live_captions.py → demo/moonshine-onnx/live_captions.py b/moonshine/demo/live_captions.py → demo/moonshine-onnx/live_captions.py
@@ -2,20 +2,14 @@
 
 import argparse
 import os
-import sys
 import time
 from queue import Queue
 
 import numpy as np
 from silero_vad import VADIterator, load_silero_vad
 from sounddevice import InputStream
-from tokenizers import Tokenizer
 
-# Local import of Moonshine ONNX model.
-MOONSHINE_DEMO_DIR = os.path.dirname(__file__)
-sys.path.append(os.path.join(MOONSHINE_DEMO_DIR, ".."))
-
-from onnx_model import MoonshineOnnxModel
+from moonshine_onnx import MoonshineOnnxModel, load_tokenizer
 
 SAMPLING_RATE = 16000
 
@@ -34,10 +28,7 @@ def __init__(self, model_name, rate=16000):
             raise ValueError("Moonshine supports sampling rate 16000 Hz.")
         self.model = MoonshineOnnxModel(model_name=model_name)
         self.rate = rate
-        tokenizer_path = os.path.join(
-            MOONSHINE_DEMO_DIR, "..", "assets", "tokenizer.json"
-        )
-        self.tokenizer = Tokenizer.from_file(tokenizer_path)
+        self.tokenizer = load_tokenizer()
 
         self.inference_secs = 0
         self.number_inferences = 0

diff --git a/demo/moonshine-onnx/requirements.txt b/demo/moonshine-onnx/requirements.txt
@@ -0,0 +1,3 @@
+silero_vad
+sounddevice
+useful-moonshine-onnx @ git+https://[email protected]/usefulsensors/moonshine.git#subdirectory=moonshine-onnx
diff --git a/moonshine/demo/moonshine-web/.gitignore → demo/moonshine-web/.gitignore b/moonshine/demo/moonshine-web/.gitignore → demo/moonshine-web/.gitignore
diff --git a/moonshine/demo/moonshine-web/downloader.js → demo/moonshine-web/downloader.js b/moonshine/demo/moonshine-web/downloader.js → demo/moonshine-web/downloader.js
diff --git a/moonshine/demo/moonshine-web/index.html → demo/moonshine-web/index.html b/moonshine/demo/moonshine-web/index.html → demo/moonshine-web/index.html
diff --git a/...hine/demo/moonshine-web/package-lock.json → demo/moonshine-web/package-lock.json b/...hine/demo/moonshine-web/package-lock.json → demo/moonshine-web/package-lock.json
diff --git a/moonshine/demo/moonshine-web/package.json → demo/moonshine-web/package.json b/moonshine/demo/moonshine-web/package.json → demo/moonshine-web/package.json
diff --git a/...ine/demo/moonshine-web/public/favicon.png → demo/moonshine-web/public/favicon.png b/...ine/demo/moonshine-web/public/favicon.png → demo/moonshine-web/public/favicon.png
diff --git a/...shine/demo/moonshine-web/public/index.css → demo/moonshine-web/public/index.css b/...shine/demo/moonshine-web/public/index.css → demo/moonshine-web/public/index.css
diff --git a/moonshine/demo/moonshine-web/src/index.js → demo/moonshine-web/src/index.js b/moonshine/demo/moonshine-web/src/index.js → demo/moonshine-web/src/index.js
diff --git a/...shine/demo/moonshine-web/src/moonshine.js → demo/moonshine-web/src/moonshine.js b/...shine/demo/moonshine-web/src/moonshine.js → demo/moonshine-web/src/moonshine.js
diff --git a/moonshine/demo/moonshine-web/vite.config.js → demo/moonshine-web/vite.config.js b/moonshine/demo/moonshine-web/vite.config.js → demo/moonshine-web/vite.config.js
diff --git a/moonshine-onnx/LICENSE b/moonshine-onnx/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Useful Sensors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/moonshine-onnx/MANIFEST.in b/moonshine-onnx/MANIFEST.in
@@ -0,0 +1,4 @@
+include requirements.txt
+include README.md
+include LICENSE
+include src/assets/*
diff --git a/moonshine-onnx/README.md b/moonshine-onnx/README.md
@@ -0,0 +1,3 @@
+# useful-moonshine-onnx
+
+Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. This package contains inference code for using Moonshine models with the ONNX runtime. For more information, please refer to the [project repo on GitHub](https://github.com/usefulsensors/moonshine).
diff --git a/moonshine-onnx/requirements.txt b/moonshine-onnx/requirements.txt
@@ -0,0 +1,4 @@
+tokenizers>=0.19.0
+onnxruntime
+huggingface_hub
+librosa
diff --git a/moonshine-onnx/setup.py b/moonshine-onnx/setup.py
@@ -0,0 +1,32 @@
+from pathlib import Path
+
+import pkg_resources
+from setuptools import setup
+
+
+def read_version(fname="src/version.py"):
+    exec(compile(open(fname, encoding="utf-8").read(), fname, "exec"))
+    return locals()["__version__"]
+
+
+setup(
+    name="useful-moonshine-onnx",
+    packages=["moonshine_onnx"],
+    package_dir={"moonshine_onnx": "src"},
+    version=read_version(),
+    description="Speech recognition for live transcription and voice commands with the Moonshine ONNX models.",
+    long_description=open("README.md", encoding="utf-8").read(),
+    long_description_content_type="text/markdown",
+    readme="README.md",
+    python_requires=">=3.8",
+    author="Useful Sensors",
+    url="https://github.com/usefulesensors/moonshine",
+    license="MIT",
+    install_requires=[
+        str(r)
+        for r in pkg_resources.parse_requirements(
+            Path(__file__).with_name("requirements.txt").open()
+        )
+    ],
+    include_package_data=True,
+)
diff --git a/moonshine-onnx/src/__init__.py b/moonshine-onnx/src/__init__.py
@@ -0,0 +1,12 @@
+from pathlib import Path
+from .version import __version__
+
+ASSETS_DIR = Path(__file__).parents[0] / "assets"
+
+from .model import MoonshineOnnxModel
+from .transcribe import (
+    transcribe,
+    benchmark,
+    load_tokenizer,
+    load_audio,
+)
diff --git a/moonshine-onnx/src/assets/beckett.wav b/moonshine-onnx/src/assets/beckett.wav
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# useful-moonshine-onnx

		Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. This package contains inference code for using Moonshine models with the ONNX runtime. For more information, please refer to the [project repo on GitHub](https://github.com/usefulsensors/moonshine).