-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spanish Gigaword text based POCOLM and RNNLM training recipe #3136
base: master
Are you sure you want to change the base?
Spanish Gigaword text based POCOLM and RNNLM training recipe #3136
Conversation
Pull forward Kaldi master
s there a reason it doesn't make sense to just replace the current example with this? Are you using a graphemic or phonemic lexicon? A graphemic lexicon might be a reasonable choice in Spanish, for simplification. |
I included the e2e process of processing the Spanish Gigaword corpus downloaded, to training RNNLM using that data in stages 0,1 in run.sh. Also, you see > 0.4% absolute WER improvement on test partitions upon adding the Spanish Gigaword text to RNNLM training data.
I am using the same Callhome Spanish rules based lexicon, which is simplified to 36 phones, after removing accented letters and digits from the non-silence phones list. So, it is similar to graphemic lexicon. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it. |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
@saikiranvalluri, where we are on this? |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
We introduce the following features into the existing fisher_spanish recipe:
We achieved WER 20.84% WER on the Fisher Spanish test partition and 24.67% WER on Fisher dev partition, using the Gigaword text-based trained RNNLM rescoring, over the baseline 3-gram LM based decoded lattices.