This repository includes the project for the second homework of the course "Deep Learning for Music Analysis and Generation" lectured by Prof. Yang at the National Taiwan University. The main goals of this work is to train a melvocoder on the M4Singer dataset. Given an mel-spectrogram of a singing segment, the model should generate the waveform corresponding to the sepctrogram. This project relies on HiFi-GAN and sobel-operator-pytorch, big thanks to the authors.
pip install -r requirements.txt
The file structure of the dataset is expected to be something similar like this:
./dataset
|- audios/
|- 0001.mp3
|- 0002.mp3
|- 0003.mp3
...
|- split
|- train.txt
|- valid.txt
The following command starts the training process with configuration 'config/config_v1.json' and save the checkpoint to the 'checkpoint/test/' folder:
python train.py \
--config=configs/config_v1.json \
--input_wavs_dir=dataset/audios \
--input_training_file=./dataset/split/train.txt \
--input_validation_file=./dataset/split/valid.txt \
--checkpoint_path=./checkpoints/test
Please change the source and destination folders and run the 'preprocess.py' file to derive the mel-spectrograms.
python -m preprocess
- Please download the model weights and config from Google Drive: config.json, weights
- Inference with the following command:
python -m inference \
# the folder path for the mel-spectrograms
--input_mels_dir=$input_mel_spec_folder \
# the folder path to save the generated audios
--output_dir=$output_audio_folder \
# the path that includes both the weight and the config
--checkpoint_file=$generator_checkpoint_path