WebRTC VAD code considered harmful (on the browser) #38

dhdaines · 2022-11-30T17:18:34Z

Considering that this code already exists somewhere in the guts of the browser, it is pretty silly to compile it separately into WebAssembly. Unfortunately, there isn't actually any API to access it from JavaScript, so we are stuck having to do our own VAD for endpointing.

The problem with the WebRTC code used in PocketSphinx5 is:

Computation is done in fixed-point, so we have to convert back and forth between Float32
WebAudio doesn't let us choose our buffer size, and neither does the VAD, so we have to implement a ring-buffer (we have to do this anyway, but...)
WebAudio can already do an FFT for us, more efficiently, but the AnalyzerNode API is utter garbage designed only for making pretty pictures, so never mind

For these reasons the ideal solution is, horror of horrors, something very much like the -remove_silence option in PocketSphinx that was the whole reason for creating SoundSwallower in the first place (because I was so seriously annoyed at it removing data from the input, making force-alignment useless). Of course, it has to be done in a way that makes endpointing optional and doesn't break the batch-mode API. So, specifically:

Encapsulate input features (MFCCs, but not necessarily) for the decoder
Create a fused feature extractor and endpointer which emits speech start/stop events and feature buffers, with timestamps

Internally we can either use the WebRTC method based on log-spectra or the PocketSphinx 5prealpha method.

The text was updated successfully, but these errors were encountered:

dhdaines added this to the 1.0.0 milestone Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebRTC VAD code considered harmful (on the browser) #38

WebRTC VAD code considered harmful (on the browser) #38

dhdaines commented Nov 30, 2022

WebRTC VAD code considered harmful (on the browser) #38

WebRTC VAD code considered harmful (on the browser) #38

Comments

dhdaines commented Nov 30, 2022