Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

Open
JackTemaki opened this issue Sep 5, 2023 · 2 comments
Open
Labels
bug Something isn't working

Comments

@JackTemaki
Copy link

Bug Description

I tried building ASR systems on a very common standard task (LibriSpeech-100h) using the torchaudio ctc decoder. This decoder uses the flashlight/text library as decoding backend. While my subword (BPE) based setups worked fine, the phoneme based did not.

The standard librispeech lexicon includes e.g. those 7 words, that in ARPA notation all get the same phone sequence:

BAE B AY#        
BAI B AY#           
BI B AY#                                                                                                                                                                                                                                                                                                                                                                            
BUY B AY#
BY B AY#
BY' B AY#
BYE B AY#

Which resulted e.g. in the word BY not being recognized anymore.
In the log I get the message:
[Trie] Trie label number reached limit: 6
which correctly tells if this limit is applied, but I would like to raise that this limit is very low, and not configurable without re-compiling. Also the message did not look to me like a serious issue at first.

Reproduction Steps

  • Use torchaudio ctc_decoder with a phoneme based lexicon containing homophones with more than 6 variations.
@JackTemaki JackTemaki added the bug Something isn't working label Sep 5, 2023
@JackTemaki
Copy link
Author

JackTemaki commented Sep 5, 2023

After removing the limit check with the following patch, my word-error-rate went from 20.3% to 17.9%:

40,46c40,41
<   if (node->labels.size() < kTrieMaxLabel) {
<     node->labels.push_back(label);
<     node->scores.push_back(score);
<   } else {
<     std::cerr << "[Trie] Trie label number reached limit: " << kTrieMaxLabel
<               << "\n";
<   }
---
>   node->labels.push_back(label);
>   node->scores.push_back(score);

Was there any reason why this arbitrary limit was put there in the first place?

@JackTemaki
Copy link
Author

JackTemaki commented Feb 15, 2024

Hello, is there still some interest to discuss this or get this fixed? With the proposed fix the decoder compares really well to our own decoder implementation, and I would like to use it for a scientific publication given the simplicity of using it. Currently I am providing a patch file with the setup / container image which is fine, but I would prefer if this would be fixed in the repository here directly.

If there is interest I can do the PR, but before I just want to clarify if this limit has any reasoning that I do not know about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant