-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTS] Add VietnameseCharsTokenizer #9665
Conversation
Signed-off-by: huutuongtu <[email protected]>
Signed-off-by: huutuongtu <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: XuesongYang <[email protected]>
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refactor the code accordingly.
@@ -184,6 +208,32 @@ def get_ipa_punctuation_list(locale): | |||
'—', # em dash, U+2014, decimal 8212 | |||
] | |||
) | |||
if locale == "vi-VN": | |||
punct_set.update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you pls add the source of punctuations of Vietnamese?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems that there isn't any 'official' source that talks about Vietnamese punctuation marks. I can find some information about punctuation marks here: https://languagedrops.com/word/en/english/vietnamese/topics/punctuation/.
Maybe we just need to use DEFAULT_PUNCTUATION.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. LGTM. pls add the source of Vietnamese punctuations if any.
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: XuesongYang <[email protected]>
* Update tts_tokenizers.py * Update tokenizer_utils.py * Update test_tts_tokenizers.py * Apply isort and black reformatting Signed-off-by: huutuongtu <[email protected]> * Signed-off-by: Tu [[email protected]](mailto:[email protected]) * Update ipa_lexicon.py - Signed-off-by: Tu [[email protected]](mailto:[email protected]) Signed-off-by: XuesongYang <[email protected]> --------- Signed-off-by: huutuongtu <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: XuesongYang <[email protected]> Co-authored-by: huutuongtu <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: XuesongYang <[email protected]> Signed-off-by: Boxiang Wang <[email protected]>
* Update tts_tokenizers.py * Update tokenizer_utils.py * Update test_tts_tokenizers.py * Apply isort and black reformatting Signed-off-by: huutuongtu <[email protected]> * Signed-off-by: Tu [[email protected]](mailto:[email protected]) * Update ipa_lexicon.py - Signed-off-by: Tu [[email protected]](mailto:[email protected]) Signed-off-by: XuesongYang <[email protected]> --------- Signed-off-by: huutuongtu <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: XuesongYang <[email protected]> Co-authored-by: huutuongtu <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: XuesongYang <[email protected]> Signed-off-by: Vivian Chen <[email protected]>
* Update tts_tokenizers.py * Update tokenizer_utils.py * Update test_tts_tokenizers.py * Apply isort and black reformatting Signed-off-by: huutuongtu <[email protected]> * Signed-off-by: Tu [[email protected]](mailto:[email protected]) * Update ipa_lexicon.py - Signed-off-by: Tu [[email protected]](mailto:[email protected]) Signed-off-by: XuesongYang <[email protected]> --------- Signed-off-by: huutuongtu <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: XuesongYang <[email protected]> Co-authored-by: huutuongtu <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: XuesongYang <[email protected]>
* Update tts_tokenizers.py * Update tokenizer_utils.py * Update test_tts_tokenizers.py * Apply isort and black reformatting Signed-off-by: huutuongtu <[email protected]> * Signed-off-by: Tu [[email protected]](mailto:[email protected]) * Update ipa_lexicon.py - Signed-off-by: Tu [[email protected]](mailto:[email protected]) Signed-off-by: XuesongYang <[email protected]> --------- Signed-off-by: huutuongtu <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: XuesongYang <[email protected]> Co-authored-by: huutuongtu <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: XuesongYang <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: Tu [email protected]
What does this PR do ?
Add a Vietnamese language tokenizer for TTS training
Collection: [TTS]
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
PR Type: