You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tokens are already decoded and displayed live during Whisper decoding, at least on the CLI.
Getting Whisper to recognize in real-time (or at least near real-time) is possible. However:
It's really important for me to get a low, usable latency. Preferably something that can be responsive enough for a real-time voice chat with a language model (along with low-latency synthesis, which is already mostly ready).
That would require some planning and code reorganization to get right.
Need to integrate an effective VAD (voice activity detection) strategy to cut the audio at the right places. Fortunately, Echogarden already has several working VAD implementations.
The text was updated successfully, but these errors were encountered:
Tokens are already decoded and displayed live during Whisper decoding, at least on the CLI.
Getting Whisper to recognize in real-time (or at least near real-time) is possible. However:
The text was updated successfully, but these errors were encountered: