Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Naturalspeech 2 Implementation #2638

Closed
wants to merge 28 commits into from

Conversation

manmay-nakhashi
Copy link
Collaborator

@manmay-nakhashi manmay-nakhashi marked this pull request as draft May 27, 2023 19:30
@kungfooman
Copy link

Thank you for the PR! I heard some of the Naturalspeech 2 examples and they sound great. But are there any models/weights to test/download this with?

Or is it possible to train our own language with this already?

@manmay-nakhashi
Copy link
Collaborator Author

@kungfooman this is still work in progress since i can only work on weekends on this it'll take some time to complete this code, right now i am still resolving some errors.

@manmay-nakhashi
Copy link
Collaborator Author

Status Update:

  • all the shapes are fixed train_naturalspeech2 is runnable , only data loss is added
  • [TODO] sengemted training implementation
  • [TODO] implement other two losses
  • [TODO] add mel_loss and check if that helps

@Iamgoofball
Copy link

Iamgoofball commented Jun 5, 2023

Let me know when you want this tested, I'd be happy to give it a run on a multispeaker use case

@manmay-nakhashi
Copy link
Collaborator Author

Let me know when you want this tested, I'd be happy to give it a run on a multispeaker use case

there are still few bugs in pitch and duration pipeline i have resolved it , but i need test it once and complete the inference function, till then it's still not trainable, once it starts generating voice on toy dataset i'll post it here.

@manmay-nakhashi
Copy link
Collaborator Author

@erogol score loss is not yet implemented , can you look at the forward function if it looks ok ? training script is running but inference is not complete.

remaining_mask = torch.ones_like(latents, dtype=torch.bool)

# Get random segment for the speech prompt
speech_prompts, segment_indices = rand_segments(
Copy link
Member

@erogol erogol Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do it in train_step not to tie the way we set the prompt and the forward function.


# iterate over the batch dimension
for i in range(latents.size(0)):
remaining_mask[i, :, segment_indices[i] : segment_indices[i] + self.diff_segment_size] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it easier to just iterate over the remaning_latents? remaining_mask is not being used somewhere else look like.

remaining_latents_lengths = torch.tensor(remaining_latents.shape[1:2]).to(remaining_latents.device)

# Encode speech prompt
speech_prompts_enc = self.prompt_encoder(speech_prompts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make a separate function to compute the prompt and return the transposed tensor to get rid of the transposes below.

@erogol
Copy link
Member

erogol commented Jun 7, 2023

@manmay-nakhashi commented on things I found

@manmay-nakhashi
Copy link
Collaborator Author

Update : code clean up is left , mostly model implementation is complete i am training on vctk and see if i am able to get some output.
i'll post the tensorboard over here.

@erogol
Copy link
Member

erogol commented Jun 16, 2023

I think it is better to implement synthesize in the model going forward. It'd be more flexible.

@erogol
Copy link
Member

erogol commented Jun 16, 2023

This is buggy -> https://github.com/coqui-ai/TTS/blob/755405d5ca5956dc073144c395332d1b24286cca/TTS/tts/models/naturalspeech2.py#LL757C65-L757C69

I think, you should be using the aligner attention not the predicted one while training

TTS/tts/models/naturalspeech2.py Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Jul 25, 2023

CLA assistant check
All committers have signed the CLA.

@stale
Copy link

stale bot commented Oct 14, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Oct 14, 2023
@stale stale bot closed this Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants