Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

JackTemaki · 2023-09-05T08:00:52Z

Bug Description

I tried building ASR systems on a very common standard task (LibriSpeech-100h) using the torchaudio ctc decoder. This decoder uses the flashlight/text library as decoding backend. While my subword (BPE) based setups worked fine, the phoneme based did not.

The standard librispeech lexicon includes e.g. those 7 words, that in ARPA notation all get the same phone sequence:

Which resulted e.g. in the word BY not being recognized anymore.
In the log I get the message:
[Trie] Trie label number reached limit: 6
which correctly tells if this limit is applied, but I would like to raise that this limit is very low, and not configurable without re-compiling. Also the message did not look to me like a serious issue at first.

Reproduction Steps

Use torchaudio ctc_decoder with a phoneme based lexicon containing homophones with more than 6 variations.

The text was updated successfully, but these errors were encountered:

JackTemaki · 2023-09-05T11:29:00Z

After removing the limit check with the following patch, my word-error-rate went from 20.3% to 17.9%:

40,46c40,41
<   if (node->labels.size() < kTrieMaxLabel) {
<     node->labels.push_back(label);
<     node->scores.push_back(score);
<   } else {
<     std::cerr << "[Trie] Trie label number reached limit: " << kTrieMaxLabel
<               << "\n";
<   }
---
>   node->labels.push_back(label);
>   node->scores.push_back(score);

Was there any reason why this arbitrary limit was put there in the first place?

JackTemaki · 2024-02-15T12:56:33Z

Hello, is there still some interest to discuss this or get this fixed? With the proposed fix the decoder compares really well to our own decoder implementation, and I would like to use it for a scientific publication given the simplicity of using it. Currently I am providing a patch file with the setup / container image which is fine, but I would prefer if this would be fixed in the repository here directly.

If there is interest I can do the PR, but before I just want to clarify if this limit has any reasoning that I do not know about.

JackTemaki added the bug Something isn't working label Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

JackTemaki commented Sep 5, 2023

JackTemaki commented Sep 5, 2023 •

edited

Loading

JackTemaki commented Feb 15, 2024 •

edited

Loading

Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

Comments

JackTemaki commented Sep 5, 2023

Bug Description

Reproduction Steps

JackTemaki commented Sep 5, 2023 • edited Loading

JackTemaki commented Feb 15, 2024 • edited Loading

JackTemaki commented Sep 5, 2023 •

edited

Loading

JackTemaki commented Feb 15, 2024 •

edited

Loading