Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eSpeak-ng markers are not generated correctly for Korean language inputs, causing issues with synthesis and alignment of Korean #57

Open
an-lee opened this issue Jun 8, 2024 · 3 comments
Labels
alignment Issue related to forced alignment bug Something isn't working external Issues that are related to external sources synthesis Issue related to speech synthesis

Comments

@an-lee
Copy link

an-lee commented Jun 8, 2024

An error occured when using align command.

➜  echogarden git:(main) echogarden align ~/Downloads/0515_CHI_ko_3.mp3 ~/Downloads/0515_CHI_ko_3.txt                                                                                                                                                                                                                                                                  npm10.5.2
Echogarden v1.5.0

Transcode with command-line ffmpeg.. 1307.3ms
Convert wave buffer to raw audio.. 3.3ms
Resample audio to 16kHz mono.. 195.3ms
Crop using voice activity detection.. 121.8ms
Normalize and trim audio.. 19.4ms
No language specified. Detect language using reference text.. 183.6ms
Language detected: Korean (ko)
Load alignment module.. 0.2ms
Synthesize alignment reference with eSpeak.. Error: Word end marker for index 6 is not consistent with word index. The words were: [
  '예는',       '프로그램은',
  '매력적인',   '아이돌을',
  '통해',       '시청자를',
  '끌어들이며', ',',
  '이는',       '단순한',
  '오락을',     '넘어',
  '사람들이',   '',
  '나은',       '삶을',
  '살도록',     '영감을',
  '줍니다',     '.'
]

Test audio & text

audio: 0515_CHI_ko_3.mp3
text: 0515_CHI_ko_3.txt

Thanks for your work!

@rotemdan
Copy link
Member

rotemdan commented Jul 2, 2024

Thanks. Sorry it took me a long time to get to this.

This appears to be an unreported eSpeak-ng bug that is particular to its Korean voice. A marker is omitted from the events when it appears before or after a comma character (,), or possibly other punctuation characters. That causes an inconsistency in the markers that produces the error.

I already have lots of workarounds for many different marker bugs.

I'll need to find a good workaround for this one. Seems like the standard one (like adding () before or after the marker), which works with virtually all the voices I've tried, doesn't work with the Korean voice.

@an-lee
Copy link
Author

an-lee commented Jul 4, 2024

Thank you for your response and effort.

This seems to be a challenging task. Please take your time with it.

@rotemdan
Copy link
Member

rotemdan commented Jul 4, 2024

I didn't realize that the problem is common with Korean texts. I would definitely want to find a good workaround to include in the next release, but it seems not to be as straightforward as I thought.

@rotemdan rotemdan added bug Something isn't working synthesis Issue related to speech synthesis alignment Issue related to forced alignment labels Jul 4, 2024
@rotemdan rotemdan changed the title Error: Word end marker for index 6 is not consistent with word index. eSpeak-ng markers are not generated correctly for Korean language inputs, causing issues with synthesis and alignment of Korean Dec 5, 2024
@rotemdan rotemdan added the external Issues that are related to external sources label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alignment Issue related to forced alignment bug Something isn't working external Issues that are related to external sources synthesis Issue related to speech synthesis
Projects
None yet
Development

No branches or pull requests

2 participants