Add support for Moonshine ASR #1099

xenova · 2024-12-14T04:19:56Z

This PR adds support for Moonshine, a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. They are well-suited to real-time, on-device applications like live transcription and voice command recognition, and will be perfect for in-browser usage. This PR is using a dev branch of transformers by @eustlb (huggingface/transformers#34784), and a dev branch of Optimum for ONNX conversion.

Example usage:

With pipeline API:

import { pipeline } from "@huggingface/transformers";

const transcriber = await pipeline("automatic-speech-recognition", "onnx-community/moonshine-tiny-ONNX");
const output = await transcriber("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav");
console.log(output);
// { text: 'And so my fellow Americans ask not what your country can do for you as what you can do for your country.' }

Without pipeline API:

import { MoonshineForConditionalGeneration, AutoProcessor, read_audio } from "@huggingface/transformers";

// Load model and processor
const model_id = "onnx-community/moonshine-tiny-ONNX";
const model = await MoonshineForConditionalGeneration.from_pretrained(model_id, {
    dtype: "q4",
});
const processor = await AutoProcessor.from_pretrained(model_id);

// Load audio and prepare inputs
const audio = await read_audio("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav", 16000);
const inputs = await processor(audio);

// Generate outputs
const outputs = await model.generate({ ...inputs, max_new_tokens: 100 });

// Decode outputs
const decoded = processor.batch_decode(outputs, { skip_special_tokens: true });
console.log(decoded[0]);
// And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

closes #990

HuggingFaceDocBuilderDev · 2024-12-14T04:21:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova · 2024-12-14T05:08:47Z

Model works with WebGPU too, and I've adapted this real-time demo to work with model. Significantly faster than the whisper version. 🔥

Add support for Moonshine ASR

a906a59

xenova mentioned this pull request Dec 14, 2024

Add support for moonshine ASR models #990

Closed

2 tasks

Add ASR pipeline API support for moonshine

fcdb3c4

xenova added 3 commits December 15, 2024 13:41

Merge branch 'main' into add-moonshine

942f01a

Add moonshine feature extractor unit test

8ceccab

Pass moonshine pipeline generation kwargs to generate

4d719a1

xenova merged commit aa60302 into main Dec 15, 2024
4 checks passed

xenova deleted the add-moonshine branch December 15, 2024 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Moonshine ASR #1099

Add support for Moonshine ASR #1099

xenova commented Dec 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 14, 2024

xenova commented Dec 14, 2024

Add support for Moonshine ASR #1099

Add support for Moonshine ASR #1099

Conversation

xenova commented Dec 14, 2024 • edited Loading

Example usage:

HuggingFaceDocBuilderDev commented Dec 14, 2024

xenova commented Dec 14, 2024

xenova commented Dec 14, 2024 •

edited

Loading