[WIP] Naturalspeech 2 Implementation #2638

manmay-nakhashi · 2023-05-27T19:30:41Z

Paper --> https://arxiv.org/pdf/2304.09116.pdf

kungfooman · 2023-05-29T08:37:09Z

Thank you for the PR! I heard some of the Naturalspeech 2 examples and they sound great. But are there any models/weights to test/download this with?

Or is it possible to train our own language with this already?

manmay-nakhashi · 2023-05-29T08:42:42Z

@kungfooman this is still work in progress since i can only work on weekends on this it'll take some time to complete this code, right now i am still resolving some errors.

…ts shaps are different

manmay-nakhashi · 2023-06-03T14:25:50Z

Status Update:

all the shapes are fixed train_naturalspeech2 is runnable , only data loss is added
[TODO] sengemted training implementation
[TODO] implement other two losses
[TODO] add mel_loss and check if that helps

Iamgoofball · 2023-06-05T01:57:24Z

Let me know when you want this tested, I'd be happy to give it a run on a multispeaker use case

manmay-nakhashi · 2023-06-06T13:51:00Z

Let me know when you want this tested, I'd be happy to give it a run on a multispeaker use case

there are still few bugs in pitch and duration pipeline i have resolved it , but i need test it once and complete the inference function, till then it's still not trainable, once it starts generating voice on toy dataset i'll post it here.

manmay-nakhashi · 2023-06-06T20:15:57Z

@erogol score loss is not yet implemented , can you look at the forward function if it looks ok ? training script is running but inference is not complete.

erogol · 2023-06-07T07:54:40Z

TTS/tts/models/naturalspeech2.py

+        remaining_mask = torch.ones_like(latents, dtype=torch.bool)
+
+        # Get random segment for the speech prompt
+        speech_prompts, segment_indices = rand_segments(


I'd do it in train_step not to tie the way we set the prompt and the forward function.

erogol · 2023-06-07T07:57:00Z

TTS/tts/models/naturalspeech2.py

+
+        # iterate over the batch dimension
+        for i in range(latents.size(0)):
+            remaining_mask[i, :, segment_indices[i] : segment_indices[i] + self.diff_segment_size] = 0


isn't it easier to just iterate over the remaning_latents? remaining_mask is not being used somewhere else look like.

erogol · 2023-06-07T07:58:11Z

TTS/tts/models/naturalspeech2.py

+        remaining_latents_lengths = torch.tensor(remaining_latents.shape[1:2]).to(remaining_latents.device)
+
+        # Encode speech prompt
+        speech_prompts_enc = self.prompt_encoder(speech_prompts)


I'd make a separate function to compute the prompt and return the transposed tensor to get rid of the transposes below.

TTS/tts/models/naturalspeech2.py

erogol · 2023-06-07T08:07:28Z

@manmay-nakhashi commented on things I found

manmay-nakhashi · 2023-06-14T17:48:14Z

Update : code clean up is left , mostly model implementation is complete i am training on vctk and see if i am able to get some output.
i'll post the tensorboard over here.

erogol · 2023-06-16T09:26:11Z

I think it is better to implement synthesize in the model going forward. It'd be more flexible.

erogol · 2023-06-16T09:27:48Z

This is buggy -> https://github.com/coqui-ai/TTS/blob/755405d5ca5956dc073144c395332d1b24286cca/TTS/tts/models/naturalspeech2.py#LL757C65-L757C69

I think, you should be using the aligner attention not the predicted one while training

TTS/tts/models/naturalspeech2.py

CLAassistant · 2023-07-25T05:27:55Z

All committers have signed the CLA.

into naturalspeech-2

stale · 2023-10-14T11:33:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

manmay-nakhashi added 4 commits May 22, 2023 01:01

naturalspeech2 initial commit

de5a5ce

add alignment update forward function

c675131

add encodec

dcef6f1

updated losses and encodecwrapper from lucidrains

c21c99c

manmay-nakhashi marked this pull request as draft May 27, 2023 19:30

clean up

605f1c5

manmay-nakhashi added 3 commits May 30, 2023 01:11

fix shape mismatches diffusion pending

4b61a3b

fixed shape till wavenet need to check addition of encoding and laten…

c2b26cc

…ts shaps are different

added recipe and fixed all the model shapes

4e39384

manmay-nakhashi added 3 commits June 3, 2023 22:38

added ce_loss

9452218

pass segmented latents to diffusion

66680de

latent and prompt segmentation according to the paper

998fad2

erogol force-pushed the dev branch from 804c6ad to a494f0c Compare June 5, 2023 09:29

update aligner loss

6f80f6a

fix durations and pitch loss

758bce6

manmay-nakhashi requested a review from erogol June 6, 2023 20:14

replace token_lens

86d8c13

erogol reviewed Jun 7, 2023

View reviewed changes

TTS/tts/models/naturalspeech2.py Show resolved Hide resolved

erogol reviewed Jun 7, 2023

View reviewed changes

TTS/tts/models/naturalspeech2.py Outdated Show resolved Hide resolved

update naturalspeech2 code lot of fixes

aa17115

manmay-nakhashi added 2 commits June 15, 2023 13:32

fix training script

44036cd

added missing changes

755405d

erogol requested changes Jun 17, 2023

View reviewed changes

TTS/tts/models/naturalspeech2.py Outdated Show resolved Hide resolved

bug fixes and trim down the model for relatively smaller dataset

b6e3d5e

manmay-nakhashi force-pushed the naturalspeech-2 branch from d8f26f0 to b6e3d5e Compare July 25, 2023 05:40

manmay-nakhashi added 10 commits July 26, 2023 18:58

update config

68a1975

Merge branch 'naturalspeech-2' of https://github.com/manmay-nakhashi/TTS

503ea2a

into naturalspeech-2

added hificodec

2651dbe

added init

487439d

added hifitts and libri_r formatter

7ba5e30

update formatter

4ab6a69

update code

fa97846

remove np.bool

ab5e25a

update loss functions

8fdc1a4

update new changes

7c09f31

stale bot added the wontfix This will not be worked on but feel free to help. label Oct 14, 2023

stale bot closed this Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Naturalspeech 2 Implementation #2638

[WIP] Naturalspeech 2 Implementation #2638

manmay-nakhashi commented May 27, 2023

kungfooman commented May 29, 2023

manmay-nakhashi commented May 29, 2023

manmay-nakhashi commented Jun 3, 2023

Iamgoofball commented Jun 5, 2023 •

edited

Loading

manmay-nakhashi commented Jun 6, 2023

manmay-nakhashi commented Jun 6, 2023

erogol Jun 7, 2023 •

edited

Loading

erogol Jun 7, 2023

erogol Jun 7, 2023

erogol commented Jun 7, 2023

manmay-nakhashi commented Jun 14, 2023

erogol commented Jun 16, 2023

erogol commented Jun 16, 2023

CLAassistant commented Jul 25, 2023 •

edited

Loading

stale bot commented Oct 14, 2023

[WIP] Naturalspeech 2 Implementation #2638

[WIP] Naturalspeech 2 Implementation #2638

Conversation

manmay-nakhashi commented May 27, 2023

kungfooman commented May 29, 2023

manmay-nakhashi commented May 29, 2023

manmay-nakhashi commented Jun 3, 2023

Iamgoofball commented Jun 5, 2023 • edited Loading

manmay-nakhashi commented Jun 6, 2023

manmay-nakhashi commented Jun 6, 2023

erogol Jun 7, 2023 • edited Loading

Choose a reason for hiding this comment

erogol Jun 7, 2023

Choose a reason for hiding this comment

erogol Jun 7, 2023

Choose a reason for hiding this comment

erogol commented Jun 7, 2023

manmay-nakhashi commented Jun 14, 2023

erogol commented Jun 16, 2023

erogol commented Jun 16, 2023

CLAassistant commented Jul 25, 2023 • edited Loading

stale bot commented Oct 14, 2023

Iamgoofball commented Jun 5, 2023 •

edited

Loading

erogol Jun 7, 2023 •

edited

Loading

CLAassistant commented Jul 25, 2023 •

edited

Loading