Replies: 1 comment
-
The request is old but have you tried or found another solution to be able to do it? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Please check out this project: https://github.com/keonlee9420/STYLER.
This project can transfer the prosodic attributes of the source speech to the target speech, preserving the intonation, speech rate, emotional style, while only changing the content of speech. However, it seems that this project only supports the implementation within the same language.
Therefore, is it possible to combine this effect with XTTS, meaning first converting the source speech to text through ASR, translating the text into the desired target language, then synthesizing the text into the target language speech using XTTS, while transferring the prosodic attributes of the source speech during the synthesis process, ultimately achieving an effect similar to Expressive speech-to-speech translation (S2ST)?
Beta Was this translation helpful? Give feedback.
All reactions