Comparative Analysis and Training Results of VITS2 with HifiGAN, iSTFT and BigVGAN #2

shigabeev · 2023-09-02T13:16:29Z

Greetings,

First and foremost, I'd like to extend my commendations on developing such an outstanding model; its performance surpasses anything I have personally trained thus far. It's a noteworthy contribution to the field, and I applaud your work.

I've conducted a series of training experiments to validate the efficiency and efficacy of your model. For ease of reference, I've made the training results, model weights, and TensorBoard logs publicly accessible. You can review them via the following Google Drive link:
Training Results and Model Weights

Moreover, I've prepared audio samples that compare the performance of your model with that of VITS2, HifiGAN, and BigVGAN. This will offer a comprehensive perspective on how your model stacks up against other state-of-the-art solutions in the domain.
Comparative Audio Samples

Best wishes

FENRlR · 2023-09-03T01:35:05Z

A huge thank you for sharing the results. The main reason of using iSTFT here was its fast synthesis speed that it showed from its original VITS variant. As so, I would say the result is far beyond my expectations. Magnificent.

shigabeev · 2023-09-04T12:11:51Z

@FENRlR do you know by chance the optimal configs for different sampling rates? I need 16kHz, 24kHz and 48kHz.

FENRlR · 2023-09-04T23:19:06Z

Currently, no. It seems there were some issues with 16kHz sampling rate in the original iSTFT repo. I've never seen the other two, however.

p0p4k · 2023-09-05T07:58:02Z

@FENRlR hi, can you add me on discord and ping me? (id -> p0p4k)'
thanks.

DavidNTompkins · 2023-09-05T19:24:37Z

Super neat! Was this on an A100? Looks like it took ~3 days?

Insensiblee · 2023-10-19T10:45:30Z

I downloaded the model from the web disk you provided, and reported this error when reasoning, do you know how to solve it?
RuntimeError: Error(s) in loading state_dict for SynthesizerTrn:
size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([155, 192]) from checkpoint, the shape in current model is torch.Size([205, 192]).

shigabeev · 2023-10-19T20:59:23Z

I downloaded the model from the web disk you provided, and reported this error when reasoning, do you know how to solve it? RuntimeError: Error(s) in loading state_dict for SynthesizerTrn: size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([155, 192]) from checkpoint, the shape in current model is torch.Size([205, 192]).

Hey, it's possible that the repository have changed and some weight sizes don't match defaults anymore. The easiest way to run it is to go back to the commit that dates back to the time of the post, clone it, plug in the weights and launch it from there.

FENRlR · 2023-10-20T02:27:32Z

@Insensiblee Before reverting back to that commit, have you tried changing symbols?
The length of symbols he used for Russian is exactly 155, while 205 is the length of the default symbol. So I'm 90% sure that
you've forgot to modify it.

w11wo mentioned this issue Sep 8, 2023

ONNX format #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparative Analysis and Training Results of VITS2 with HifiGAN, iSTFT and BigVGAN #2

Comparative Analysis and Training Results of VITS2 with HifiGAN, iSTFT and BigVGAN #2

shigabeev commented Sep 2, 2023

FENRlR commented Sep 3, 2023

shigabeev commented Sep 4, 2023

FENRlR commented Sep 4, 2023

p0p4k commented Sep 5, 2023

DavidNTompkins commented Sep 5, 2023

Insensiblee commented Oct 19, 2023

shigabeev commented Oct 19, 2023

FENRlR commented Oct 20, 2023 •

edited

Loading

Comparative Analysis and Training Results of VITS2 with HifiGAN, iSTFT and BigVGAN #2

Comparative Analysis and Training Results of VITS2 with HifiGAN, iSTFT and BigVGAN #2

Comments

shigabeev commented Sep 2, 2023

FENRlR commented Sep 3, 2023

shigabeev commented Sep 4, 2023

FENRlR commented Sep 4, 2023

p0p4k commented Sep 5, 2023

DavidNTompkins commented Sep 5, 2023

Insensiblee commented Oct 19, 2023

shigabeev commented Oct 19, 2023

FENRlR commented Oct 20, 2023 • edited Loading

FENRlR commented Oct 20, 2023 •

edited

Loading