Query regarding post processing #76

JRMeyer · 2021-03-07T08:04:44Z

JRMeyer
Mar 7, 2021
Maintainer

>>> alchemi5t
[August 19, 2019, 8:15am]

Hello,

I have managed to train a model with 13hrs of annotated data. The
alignment is great and the words from the generated test sentences are
easily discernible. The only issue I have is that the generated audio is
not a 100% human-like( a hint of consistent robotic chop-ups in it). I
was wondering if I should fix that with post processing? or could I
handle it with hyper parameter tuning?

[This is an archived TTS discussion thread from discourse.mozilla.org/t/query-regarding-post-processing]

JRMeyer · 2021-03-07T08:04:47Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 19, 2019, 4:11pm]

{.avatar

> ( a hint of consistent robotic chop-ups in it)

With WaveRNN or just TTS?

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:04:49Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 19, 2019, 4:35pm]

Without wavernn. Just tacotron2 and griffinlim. I am trying out
tacotron2+ erogol's implementation of wavernn right now.

How much of a difference does wavernn make?

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:04:52Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 19, 2019, 4:49pm]

{.avatar

> How much of a difference does wavernn make?

https://soundcloud.com/user-565970875/ljspeech-logistic-wavernn slash
WaveRNN using MOL sounds good to me.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:04:55Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 19, 2019, 4:56pm]

I'll try it out and post the results!

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:04:57Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[August 20, 2019, 10:50am]

make sure you disable forward-attention in traning. That might give you
a smoother results. Then you can enable it only at inference.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:00Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 20, 2019, 11:58am]

Sounds good. I'll try that out.

Regarding your WaveRNN repo, in the ipynb for extracting mel specs, In
generate model outputs( line 37) should that be inside the If condition
for tacotron? Because of it's not, the list mel_specs is empty and you
cannot stack it. And also, the comment for dry_run is the opposite of
what it should be, if I am not wrong. Did I miss something or need these
be corrected?

Also, great job setting up wavernn! Thank you.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:02Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 21, 2019, 4:46am]

Hey carl!

Have you tried out WaveRNN? How long does it take to train till 600k
iterations? I am training on one Titan V with 32 workers and it's giving
me around 0.3 steps/sec, which would take over 20 days. is that right?

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:05Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 21, 2019, 5:03am]

I'm on it too, no luck yet, attaching an example.

Still getting familiar with the behavior to know that the things will
work. I'm fine tuning from the mold checkpoint and testing configs.

> 32 workers

I think is too much, I use 8 or 6 for a V100.

Here's the log for 500k from the 400k checkpoint:

log-wavernn.zip
(70,8 KB)

tuxvocoder.zip
(84,7 KB)

Using the defaults params and 30h of clean speech from a single speaker.
I'm not sure if I train it more will succeed, I don't see any
improvement over the steps.

> around 0.3 steps/sec

About that, try uninstalling torch and use 0.4.1, sort of remember
getting the same speed

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:08Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 21, 2019, 6:42am]

Thank you for the detailed report. I tried with 6 to 8 workers and that
slowed down my training to slash ~ 0.1 steps/sec. From your logs, you were
getting 2.2 steps/sec (Granted that datasets are different, my longest
seq is 420 long, so at best I should've slowed down only slash ~3.6(since
default longest seq is 150 long) times, instead of the slash ~8 time
retardation). Regardless, I'll keep training it to see how things go.

{.avatar

> Here's the log for 500k from the 400k checkpoint:

What language is this? Is the output as expected or is it just
gibberish?

{.avatar

> 0.4.1

I'll try this out next.

Also, if possible, could you share your config file for WaveRNN?

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:10Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 21, 2019, 6:53am]

{.avatar

> gibberish

Yes

{.avatar

> What language is this?

Spanish

{.avatar

> Also, if possible, could you share your config file for WaveRNN?

Here:
https://drive.google.com/drive/folders/1wpPn3a0KQc6EYtKL0qOi4NqEmhML71Ve

Just changed my paths and mel_fmin to 50.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:13Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 21, 2019, 8:09am]

{.avatar

> About that, try uninstalling torch and use 0.4.1

I got a huge boost from 0.3 steps/sec to slash ~0.9 steps/sec. Great catch!

Just FYI,

'batch_size': 64,
'num_workers': 32,

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:15Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 21, 2019, 8:12am]

{.avatar

> 'batch_size': 64,

The default value for the batch_size is 32, 2.2/s steps for me using 32.
fatchord mentioned that no one tried or at least shared results using
values different than 32, keep in mind that it may not work due to the
huge difference from the tested values.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:18Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 21, 2019, 8:19am]

I'll start a parallel run with 32 batch_size and see if that helps. I'll
report results for both.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:21Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 21, 2019, 8:21am]

Awesome! Thanks, really helpful for people with limited compute power.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:23Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 21, 2019, 4:05pm]

30k iterations in and the val loss is around 3.9 but all I hear is loud
static. Did you have similar outputs? Because your log at 400k shows
loss around 4. My inference script might be flawed. I'll share it
tomorrow, please check it out and share insights, if possible .

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:26Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 21, 2019, 5:44pm]

{.avatar

> 30k iterations

From checkpoint or from scratch?

{.avatar

> Did you have similar outputs

Loud static noise, no. Just weird speech, like whisper but fast.
Currently fine tuning taco2 on larger sentences, I'll share results.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:28Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 21, 2019, 5:50pm]

{.avatar

> From checkpoint or from scratch?

From scratch.

{.avatar

> taco2 on larger sentences

Mine is taco2 on fairly large sentences(largest is 420 characters long).

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:31Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> carlfm01
[August 21, 2019, 5:56pm]

{.avatar

> From scratch.

Needs more training, there was an issue where mentioned that 200k start
to generate sort of speech. (Of ourse dataset related)

{.avatar

> Mine is taco2 on fairly large sentences(largest is 420 characters
> long).

How did you fit 420 characters? I can't even fit 290 with 16GB, on the
last steps rush from 6GB to OOM
![:confused:](

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:05:33Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> alchemi5t
[August 23, 2019, 4:13am]

{.avatar

> Needs more training, there was an issue where mentioned that 200k
> start to generate sort of speech. (Of ourse dataset related)

That's good news for me.

{.avatar

> How did you fit 420 characters? I can't even fit 290 with 16GB, on the
> last steps rush from 6GB to OOM

I trained it with a batch size of 16 and r=2. when i got it to r=1, I
started getting OOM at which point i trained on a batchsize of 8. Like i
said, the output of the TTS was great, just not very humanlike. Words,
pronounciations and intonations were spot on at 150k steps. slash [on 13 hrs
of data slash ]

It's consistently static, through different models, steps and settings.
Very good chance the inference script is bad. slash
static.zip
(560.9 KB)

[next page
→

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query regarding post processing #76

{{title}}

Replies: 19 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Query regarding post processing #76

JRMeyer Mar 7, 2021 Maintainer

Replies: 19 comments

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author