Recognition: real-time, streaming Whisper recognition #13

rotemdan · 2023-07-27T15:51:03Z

Tokens are already decoded and displayed live during Whisper decoding, at least on the CLI.

Getting Whisper to recognize in real-time (or at least near real-time) is possible. However:

It's really important for me to get a low, usable latency. Preferably something that can be responsive enough for a real-time voice chat with a language model (along with low-latency synthesis, which is already mostly ready).
That would require some planning and code reorganization to get right.
Need to integrate an effective VAD (voice activity detection) strategy to cut the audio at the right places. Fortunately, Echogarden already has several working VAD implementations.

rotemdan added enhancement New feature or request recognition Issue related to speech recognition labels Jul 27, 2023

Provide feedback