A tool that allows you to learn soft multilingual speech with a small amount of data set (5-10 minutes) using RVC. Most speech synthesis models require vast amounts of data. However, it is not always possible to learn only in situations where there is a lot of data. This repository started with the idea of "Then why don't we clone a dataset and use it?"
- RVC Training with few dataset
- Dataset Cloning with Trained RVC Model.
- Training Vits
- Inference
- Python >= 3.8
- Download RVC-VITS.zip and unzip RVC-VITS.zip
- Install python requirements. Please refer requirements.txt
- You may need to install espeak first:
apt-get install espeak
- You may need to install espeak first:
- Build requirements.txt and torch
./set_env.sh
- Put the dataset into the rvc_dataset directory according to the following file structure. In this experiment, I used 50 wavs files of ljspeech datasets (330 seconds).
rvc_dataset
├───ljs
│ ├───LJ001-0001.wav
│ ├───LJ001-0002.wav
│ ├───...
│ └───LJ001-0050.wav
./train_rvc.sh ljs 500
# If you want to train korean tts, change ja to ko (ja -> japanese, ko -> korean, en -> english)
./make_dataset.sh ljs ja
./train_vits.sh ljs
See vits/inference.ipynb
See ljs_ja_voice
Language | Name | Link |
---|---|---|
Korean | KSS | https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset |
Japanese | JSUT | https://sites.google.com/site/shinnosuketakamichi/publication/jsut |
English | LJSPEECH | https://keithito.com/LJ-Speech-Dataset/ |