Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Moonshine ASR #1099

Merged
merged 5 commits into from
Dec 15, 2024
Merged

Add support for Moonshine ASR #1099

merged 5 commits into from
Dec 15, 2024

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Dec 14, 2024

This PR adds support for Moonshine, a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. They are well-suited to real-time, on-device applications like live transcription and voice command recognition, and will be perfect for in-browser usage. This PR is using a dev branch of transformers by @eustlb (huggingface/transformers#34784), and a dev branch of Optimum for ONNX conversion.

Example usage:

With pipeline API:

import { pipeline } from "@huggingface/transformers";

const transcriber = await pipeline("automatic-speech-recognition", "onnx-community/moonshine-tiny-ONNX");
const output = await transcriber("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav");
console.log(output);
// { text: 'And so my fellow Americans ask not what your country can do for you as what you can do for your country.' }

Without pipeline API:

import { MoonshineForConditionalGeneration, AutoProcessor, read_audio } from "@huggingface/transformers";

// Load model and processor
const model_id = "onnx-community/moonshine-tiny-ONNX";
const model = await MoonshineForConditionalGeneration.from_pretrained(model_id, {
    dtype: "q4",
});
const processor = await AutoProcessor.from_pretrained(model_id);

// Load audio and prepare inputs
const audio = await read_audio("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav", 16000);
const inputs = await processor(audio);

// Generate outputs
const outputs = await model.generate({ ...inputs, max_new_tokens: 100 });

// Decode outputs
const decoded = processor.batch_decode(outputs, { skip_special_tokens: true });
console.log(decoded[0]);
// And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

closes #990

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova xenova mentioned this pull request Dec 14, 2024
2 tasks
@xenova
Copy link
Collaborator Author

xenova commented Dec 14, 2024

Model works with WebGPU too, and I've adapted this real-time demo to work with model. Significantly faster than the whisper version. 🔥

@xenova xenova merged commit aa60302 into main Dec 15, 2024
4 checks passed
@xenova xenova deleted the add-moonshine branch December 15, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for moonshine ASR models
2 participants