BetterFastSpeech 2

It is the ordinary FastSpeech 2 architecture with some modifications. I just wanted to make the code base better and more readable. And finally have an open source implementation of FastSpeech 2 that doesn't sounds bad and is easier to hack and work with.

If you like this you will love Matcha-TTS

Changes from the original architecture:

Instead of using MFA, I obtained alignment from a pretrained Matcha-TTS model.
- To save myself from the pain of setting up and training MFA
Used IPA phonemes with blanks in between phones.
No LR decay
Duration prediction in log domain
Everyone seems to be using the postnet from Tacotron 2; I've used it as well.

Link to LJ Speech checkpoint Running the code locally with cli will autodownload the checkpoint as well.

Installation

Create an environment (suggested but optional)

conda create -n betterfs2 python=3.10 -y
conda activate betterfs2

Install from source

git clone https://github.com/shivammehta25/BetterFastSpeech2.git
cd BetterFastSpeech2
pip install -e .

Run CLI / gradio app / jupyter notebook

# This will download the required models
betterfs2 --text "<INPUT TEXT>"

or open synthesis.ipynb on jupyter notebook

Train with your own dataset

Let's assume we are training with LJ Speech

Download the dataset from here, extract it to data/LJSpeech-1.1, and prepare the file lists to point to the extracted data like for item 5 in the setup of the NVIDIA Tacotron 2 repo.
Train a Matcha-TTS model to extract durations or if you have a pretrained model, you can use that as well.

Your data directory should look like:

data/
└── LJSpeech-1.1
    ├── durations/ # Here
    ├── metadata.csv
    ├── README
    ├── test.txt
    ├── train.txt
    ├── val.txt
    └── wavs/

Clone and enter the BetterFastSpeech2 repository

git clone https://github.com/shivammehta25/BetterFastSpeech2.git
cd BetterFastSpeech2

Install the package from source

pip install -e .

Go to configs/data/ljspeech.yaml and change

train_filelist_path: data/LJSpeech-1.1/train.txt
valid_filelist_path: data/LJSpeech-1.1/val.txt

Generate normalisation statistics with the yaml file of dataset configuration

python fs2/utils/preprocess.py -i ljspeech
# Output:
#{'pitch_min': 67.836174, 'pitch_max': 578.637146, 'pitch_mean': 207.001846, 'pitch_std': 52.747742, 'energy_min': 0.084354, 'energy_max': 190.849121, 'energy_mean': 21.330254, 'energy_std': 17.663319, 'mel_mean': -5.554245, 'mel_std': 2.059021}

Update these values in configs/data/ljspeech.yaml under data_statistics key.

data_statistics:  # Computed for ljspeech dataset
    pitch_min: 67.836174 
    pitch_max: 792.962036
    pitch_mean: 211.046158
    pitch_std: 53.012085
    energy_min: 0.023226
    energy_max: 241.037918
    energy_mean: 21.821531
    energy_std: 18.17124
    mel_mean: -5.517035
    mel_std: 2.064413

to the paths of your train and validation filelists.

Run the training script

python fs2/train.py experiment=ljspeech

for multi-gpu training, run

python fs2/train.py experiment=ljspeech trainer.devices=[0,1]

Synthesise from the custom trained model

betterfs2 --text "<INPUT TEXT>" --checkpoint_path <PATH TO CHECKPOINT>

Citation information

If you use our code or otherwise find this work useful, please cite our paper:

@inproceedings{mehta2024matcha,
  title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2024}
}

Acknowledgements

Since this code uses Lightning-Hydra-Template, you have all the powers that come with it.

Other source code we would like to acknowledge:

Matcha-TTS: Base TTS from which we get alignments.
FastPitch: For transformer implementation
FastSpeech 2: For variance predictor implementations

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
configs		configs
data		data
fs2		fs2
notebooks		notebooks
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
.pylintrc		.pylintrc
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
synthesis.ipynb		synthesis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BetterFastSpeech 2

Installation

Train with your own dataset

Citation information

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

shivammehta25/BetterFastSpeech2

Folders and files

Latest commit

History

Repository files navigation

BetterFastSpeech 2

Installation

Train with your own dataset

Citation information

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages