VoskJs is a NodeJs developers toolkit to use Vosk offline speech recognition engine. It give you:
- simple sentence-based transcript APIs
- command line utility
voskjs
- demo HTTP transcript server
voskjshttp
.
VoskJs can be used for speech recognition processing in different scenarios:
- Single-user/standalone programs (e.g. perfect for single-user embedded systems)
- Multi-user/multi-core server architectures
Vosk is an open source embedded (offline, on-premise) speech-to-text engine which can run in real time also on small devices. It's based on Kaldi, but Nikolay V. Shmyrev's Vosk offers a smarti, simplified and performant interface!
Documentation:
The goal of the project is to:
-
Create an simple function API layer on top of already existing Vosk nodejs binding, supplying main sentence-based speech-to-text functionalities:
-
const model = loadModel(modelDirectory)
Loads once in RAM memory a specific Vosk engine model from a model directory.
-
transcriptFromFile(fileName, model, options)
-
transcriptFromBuffer(buffer, model, options)
At run-rime, transcripts a speech file or buffer (in WAV/PCM format), through the Vosk engine Recognizer. It supply speech-to-text transcript detailed info.
-
freeModel(mode)
Using the simple transcript interface you can build your standalone custom application, accessing async functions suitable to run on a usual single thread nodejs program.
-
-
voskjs
command line program to test Vosk transcript with specific models (some tests and command line usage here).
-
voskjshttp
a simple demo HTTP server to transcript speech files.
-
Build your own server. Some usage examples here.
-
Install vosk-api engine
pip3 install vosk
See also: https://alphacephei.com/vosk/install
-
Install this module, as global package if you want to use CLI command
voskjs
npm install -g @solyarisoftware/voskjs
mkdir your/path/models && cd models
# English large model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-aspire-0.2.zip
unzip vosk-model-en-us-aspire-0.2.zip
# English small model
wget http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
# Italian model model
wget https://alphacephei.com/vosk/models/vosk-model-small-it-0.4.zip
unzip vosk-model-small-it-0.4.zip
More about available Vosk models here: https://alphacephei.com/vosk/models
Directory audio
contains some English language speech audio files,
coming from a Mozilla DeepSpeech repo.
Source: Mozilla DeepSpeech audio samples
These files are used for some tests and comparisons.
Some transcript usage examples here
Some tests / notes here:
- Transcript using English language, large model
- Transcript using English language, small model
- Comparison between Vosk and Mozilla DeepSpeech (latencies)
- Multithread stress test (10 requests in parallel)
- HTTP Server benchmark test
- Latency tests
-
💣 Important open issue to be solved: solyarisoftware#3 with a temporrary workaround: alphacep/vosk-api#516 (comment)
-
Implement a simplified interface for all Vosk-api functions
-
Deepen grammar usage with examples
-
Review stress and performances tests (especially for the HTTP server)
-
To speedup latencies, rethink transcript interface, maybe with an initialization phases, including Model creation an the Recognizer(s) creation
Any contribute is welcome.
- Discussions. Please open a new discussion (a publich chat on github) for any specific open topic, for a clarification, change request proposals, etc.
- Issues Please submit issues for bugs, etc
- e-mail You can contact me privately, via email
Thanks to Nicolay V. Shmyrev, author of Vosk project, for the help about nodeJs API bindings for multi-threading management
See also:
- What's the Vosk CPU usage at run-time?
- How to set-up a Vosk multi-threads server architecture in NodeJs
MIT (c) Giorgio Robino