Recognition: implement beam search for Whisper decoder #12

rotemdan · 2023-07-27T15:42:23Z

Beam search would enable the decoder to consider multiple recognitions simultaneously.

Currently not a high priority, because of several reasons:

The goal of the Whisper implementation is a good speed / quality tradeoff. Not sure having more than one decoding path would be a good tradeoff in all cases.
Whisper inference is currently only supported on CPU, meaning even a beam width of 2 would significantly reduce speed.
It is more important, at this moment, to get real-time and streaming recognition running. Due to the extra cost of beam search, it's unlikely it would be used in real-time situations (at least over CPU).
There are alternative approaches to get better quality, like using a larger model, or various guided decoding strategies.

rotemdan added enhancement New feature or request recognition Issue related to speech recognition labels Jul 27, 2023

Provide feedback