-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
857377c
commit c9ecb8f
Showing
18 changed files
with
309 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
#!/usr/bin/env bash | ||
set -eo pipefail | ||
|
||
# Generates audio samples for voices. | ||
# Requires: ffmpeg jq | ||
|
||
if [ -z "$1" ]; then | ||
echo 'Usage: generate-samples.sh <piper-voices>' | ||
exit 1 | ||
fi | ||
|
||
this_dir="$( cd "$( dirname "$0" )" && pwd )" | ||
repo_dir="$(realpath "${this_dir}/../../")" | ||
|
||
venv="${repo_dir}/src/python/.venv" | ||
if [ -d "${venv}" ]; then | ||
source "${venv}/bin/activate" | ||
fi | ||
|
||
# ----------------------------------------------------------------------------- | ||
|
||
piper_voices="$1" | ||
piper_binary="${repo_dir}/install/piper" | ||
|
||
find "${piper_voices}" -name '*.onnx' | sort | \ | ||
while read -r onnx; do | ||
voice_dir="$(dirname "${onnx}")"; | ||
quality="$(basename "${voice_dir}")" | ||
dataset_dir="$(dirname "${voice_dir}")"; | ||
dataset="$(basename "${dataset_dir}")" | ||
language_dir="$(dirname "${dataset_dir}")"; | ||
language="$(basename "${language_dir}")" | ||
language_family_dir="$(dirname "${language_dir}")"; | ||
language_family="$(basename "${language_family_dir}")" | ||
|
||
test_sentences="${repo_dir}/etc/test_sentences/${language_family}.txt" | ||
if [ ! -s "${test_sentences}" ]; then | ||
echo "[ERROR] Missing ${test_sentences}" >&2; | ||
continue; | ||
fi | ||
|
||
samples_dir="${voice_dir}/samples" | ||
mkdir -p "${samples_dir}" | ||
|
||
num_speakers="$(jq --raw-output '.num_speakers' "${onnx}.json")" | ||
sample_rate="$(jq --raw-output '.audio.sample_rate' "${onnx}.json")" | ||
last_speaker_id="$((num_speakers-1))" | ||
|
||
# Generate a sample from the first test sentence for each speaker | ||
for speaker_id in `seq 0 ${last_speaker_id}`; do | ||
sample_mp3="${samples_dir}/speaker_${speaker_id}.mp3" | ||
if [ -s "${sample_mp3}" ]; then | ||
sample_mp3_size="$(stat --printf='%s' "${sample_mp3}")" | ||
else | ||
sample_mp3_size='0' | ||
fi | ||
|
||
if [ "${sample_mp3_size}" -lt 1000 ]; then | ||
echo "Generating sample for ${dataset} (quality=${quality}, speaker=${speaker_id})" | ||
|
||
# Compress to MP3 with ffmpeg | ||
head -n1 "${test_sentences}" | \ | ||
"${piper_binary}" --model "${onnx}" --speaker "${speaker_id}" --output_raw | \ | ||
ffmpeg -hide_banner -loglevel warning -y \ | ||
-sample_rate "${sample_rate}" -f s16le -ac 1 -i - \ | ||
-codec:a libmp3lame -qscale:a 2 "${sample_mp3}"; | ||
fi; | ||
done | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Model card for cori (high) | ||
|
||
* Language: en_GB (English, Great Britain) | ||
* Speakers: 1 | ||
* Quality: medium | ||
* Samplerate: 22,050Hz | ||
|
||
## Dataset | ||
|
||
* URL: https://librivox.org | ||
* License: public domain | ||
|
||
## Training | ||
|
||
See: https://brycebeattie.com/files/tts/ | ||
|
||
UK English female voice. Single Speaker. Trained from scratch on high quality settings for 500 epochs. I put together the dataset, which ended up with about 24 hours of recordings. All recordings came from LibriVox.org. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
A rainbow is a meteorological phenomenon that is caused by reflection, refraction and dispersion of light in water droplets resulting in a spectrum of light appearing in the sky. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Model card for kristin (medium) | ||
|
||
* Language: en_US(English, United States) | ||
* Speakers: 1 | ||
* Quality: medium | ||
* Samplerate: 22,050Hz | ||
|
||
## Dataset | ||
|
||
* URL: https://librivox.org | ||
* License: public domain | ||
|
||
## Training | ||
|
||
See: https://brycebeattie.com/files/tts/ | ||
|
||
US English female voice. Single Speaker. Trained from scratch on medium quality settings for 2000 epochs. I put together the dataset, which ended up with about 11.5 hours of recordings. All recordings came from LibriVox.org. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
A rainbow is a meteorological phenomenon that is caused by reflection, refraction and dispersion of light in water droplets resulting in a spectrum of light appearing in the sky. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Model card for ljspeech (high) | ||
|
||
* Language: en_US (English, United States) | ||
* Speakers: 1 | ||
* Quality: medium | ||
* Samplerate: 22,050Hz | ||
|
||
## Dataset | ||
|
||
* URL: https://keithito.com/LJ-Speech-Dataset/ | ||
* License: public domain | ||
|
||
## Training | ||
|
||
See: https://brycebeattie.com/files/tts/ | ||
|
||
US English female voice. Single speaker. Trained from scratch for 1000 epochs on medium quality settings using the LJ Speech dataset. I reencoded the recordings to a bit rate of 22500 Hz so it would match other voices released for Piper TTS. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
A rainbow is a meteorological phenomenon that is caused by reflection, refraction and dispersion of light in water droplets resulting in a spectrum of light appearing in the sky. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Model card for ljspeech (medium) | ||
|
||
* Language: en_US (English, United States) | ||
* Speakers: 1 | ||
* Quality: medium | ||
* Samplerate: 22,050Hz | ||
|
||
## Dataset | ||
|
||
* URL: https://keithito.com/LJ-Speech-Dataset/ | ||
* License: public domain | ||
|
||
## Training | ||
|
||
See: https://brycebeattie.com/files/tts/ | ||
|
||
US English female voice. Single speaker. Trained from scratch for 1000 epochs on medium quality settings using the LJ Speech dataset. I reencoded the recordings to a bit rate of 22500 Hz so it would match other voices released for Piper TTS. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
A rainbow is a meteorological phenomenon that is caused by reflection, refraction and dispersion of light in water droplets resulting in a spectrum of light appearing in the sky. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Model card for claude (high) | ||
|
||
* Language: es_MX (Spanish, Mexico) | ||
* Speakers: 1 | ||
* Quality: high | ||
* Samplerate: 22,050Hz | ||
|
||
## Dataset | ||
|
||
* URL: https://huggingface.co/spaces/HirCoir/Piper-TTS-Spanish | ||
* License: apache-2.0 | ||
|
||
## Training | ||
|
||
See URL above |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Un arcoíris o arco iris es un fenómeno óptico y meteorológico que consiste en la aparición en el cielo de un arco de luz multicolor, originado por la descomposición de la luz solar en el espectro visible, la cual se produce por refracción, cuando los rayos del sol atraviesan pequeñas gotas de agua contenidas en la atmósfera terrestre. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters