XTTS v2 - please help out a noob #3457

andykaufseo · 2023-12-21T21:47:50Z

andykaufseo
Dec 21, 2023

I have a long list of questions, recently starting using coqui (xtts v2 cloned voice), i have some experience with LLMs and more experience Stable Diffusion, but i'm just getting started with TTS (and STT). Here are my questions:

i tested xtts v2 with a cloned voice, it sounds great - a bit slow tho but no biggie - question is, if this sounds so good, why is there a fine tuning setup for this? i mean, do i fine tune the base with the cloned voice - to sound even more like the cloned voice? if that's the case, if i clone another voice (while using the fine tuned model with the first cloned voice, what will happen)
as of today, which is the best model for cloning voices? xtts2 sounds great, i remember trying bark a few months ago when it got out and it was a mess, so was vits (code was messed up, couldn't get it to work and the demo was bad).
i see that with cloned voices, xtts2 is a bit slow (not a pain in the ass, but could be faster). i see some things that claim to do almost real time voice gen (https://github.com/KoljaB/RealtimeTTS) - is this possible with xtts2?
i remember Bark could insert things like laughter, music, etc - could i use that in combination with xtts2

m000lie · 2024-03-20T04:44:16Z

m000lie
Mar 20, 2024

the finetuning setup is meant for you to further refined the base model when cloning your target voice. there are some instances where the base model doesn't perform to it's best on zero shot inference (unseen data).

as such, fine tuning is still necessary for a subset of use-cases.

0 replies

guptaprakhariitr · 2024-03-28T15:54:05Z

guptaprakhariitr
Mar 28, 2024

Hi @andykaufseo, If you've found answers to these, please update them here. Would like to hear from your experience.

0 replies

MethanJess · 2024-04-02T19:39:17Z

MethanJess
Apr 2, 2024

@andykaufseo

Fine-tuning a model can get your voice to sound more like the target voice, the 0-shot cloning is not perfect, I have noticed a massive difference between a finetuned model and a 0-shot one.
The best model i would definitely say is the one used on myvoice.speechfiy, it can clone even the most unique voices, and gives out very realistic voices that almost perfectly match the original audio, and can do very well even on a 1 minute sample, sadly it's locked behind a very hefty subscription, a free one is play.ht, but it is not as good, for open source ones, Metavoice and VoiceCraft are very good.
Yeah, that repository seems to be legit from just looking at it...
I could be wrong, but I believe that if you fine tune your own model and add laughter sounds in the sample then put in the transcript something like "[laughter]" then I think the model will be output a laughter sound each time you input "[laughter]", but not sure if there's a way currently to combine bark and coqui together...

2 replies

jash2129 Jul 20, 2024

Does it have Telugu(Indian language) library.

MethanJess Jul 23, 2024

@jash2129 not sure.... coqui supports 16 langues, but i doubt that Telugu is one of them...
You can try other open source projects like this one: https://github.com/JarodMica/ai-voice-cloning
I heard it can train on any language or something like that

AjibolaPy · 2024-11-25T05:29:36Z

AjibolaPy
Nov 25, 2024

I tried xtts base model with my own voice it got it quite close but the accent was very different. Will fine-tuning get my voice and accent correct, or close atleast?, And in finetuning do I need sample of only my voice or different voice with same accent

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XTTS v2 - please help out a noob #3457

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

XTTS v2 - please help out a noob #3457

andykaufseo Dec 21, 2023

Replies: 4 comments · 2 replies

m000lie Mar 20, 2024

guptaprakhariitr Mar 28, 2024

MethanJess Apr 2, 2024

jash2129 Jul 20, 2024

MethanJess Jul 23, 2024

AjibolaPy Nov 25, 2024

andykaufseo
Dec 21, 2023

Replies: 4 comments 2 replies

m000lie
Mar 20, 2024

guptaprakhariitr
Mar 28, 2024

MethanJess
Apr 2, 2024

AjibolaPy
Nov 25, 2024