Poor performance - Russian obscene STT #123

snakers4 · 2019-03-19T13:47:21Z

Hi!

Many thanks for your amazing easy to use STT product!
I have yet to learn how to use your text models, but STT seems to work out-of-the-box really fine.

My language in Russian, and you may know that it features a great deal of obscene words, that people commonly use in some contexts.

In our use-case we have to recognize these words as well as ordinary words.
Looks like your language model on top of acoustic model does not know them.
We could add our own language model, but in this case we would need raw acoustic model outputs.

Is is somehow possible with the current API?
Looks like the pywit it just a requests wrapper and 99% of work is done on server-side.

patapizza · 2019-03-25T17:37:11Z

Hi @snakers4,

Thank you for the kind words.

Indeed, pywit is just a thin wrapper of our HTTP API. For tracking service-related questions and issues, we use https://github.com/wit-ai/wit/issues.

Personalized language models is something we want to support down the road. I'll share your input with the team. In the meantime, you can use the voice inbox to correct the transcripts.

snakers4 · 2019-03-26T04:19:50Z

I'll share your input with the team

Many thanks!

Turns our there are much simpler ways to check data at scale:

Check via calculating WER against another source of annotation;
Check the number of words / number of letters vs. duration of the clips - there should be direct correlation, if there is none, then STT quality is low;
Truncate clips that have less than 2 words or 10 symbols;
Truncate clips that have special symbols, latin symbols, etc;

A combination of these basically allows to build fast heuristics to take only the most relevant texts.

patapizza added the enhancement label Mar 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance - Russian obscene STT #123

Poor performance - Russian obscene STT #123

snakers4 commented Mar 19, 2019

patapizza commented Mar 25, 2019

snakers4 commented Mar 26, 2019

Poor performance - Russian obscene STT #123

Poor performance - Russian obscene STT #123

Comments

snakers4 commented Mar 19, 2019

patapizza commented Mar 25, 2019

snakers4 commented Mar 26, 2019