Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebRTC VAD code considered harmful (on the browser) #38

Open
dhdaines opened this issue Nov 30, 2022 · 0 comments
Open

WebRTC VAD code considered harmful (on the browser) #38

dhdaines opened this issue Nov 30, 2022 · 0 comments
Milestone

Comments

@dhdaines
Copy link
Contributor

Considering that this code already exists somewhere in the guts of the browser, it is pretty silly to compile it separately into WebAssembly. Unfortunately, there isn't actually any API to access it from JavaScript, so we are stuck having to do our own VAD for endpointing.

The problem with the WebRTC code used in PocketSphinx5 is:

  • Computation is done in fixed-point, so we have to convert back and forth between Float32
  • WebAudio doesn't let us choose our buffer size, and neither does the VAD, so we have to implement a ring-buffer (we have to do this anyway, but...)
  • WebAudio can already do an FFT for us, more efficiently, but the AnalyzerNode API is utter garbage designed only for making pretty pictures, so never mind

For these reasons the ideal solution is, horror of horrors, something very much like the -remove_silence option in PocketSphinx that was the whole reason for creating SoundSwallower in the first place (because I was so seriously annoyed at it removing data from the input, making force-alignment useless). Of course, it has to be done in a way that makes endpointing optional and doesn't break the batch-mode API. So, specifically:

  • Encapsulate input features (MFCCs, but not necessarily) for the decoder
  • Create a fused feature extractor and endpointer which emits speech start/stop events and feature buffers, with timestamps

Internally we can either use the WebRTC method based on log-spectra or the PocketSphinx 5prealpha method.

@dhdaines dhdaines added this to the 1.0.0 milestone Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant