Releases: KoljaB/RealtimeTTS
Releases · KoljaB/RealtimeTTS
v0.4.3
RealtimeTTS v0.4.3 Release Notes
New Feature: PiperEngine
-
Introduction
- Introducing the PiperEngine to support the Piper text-to-speech model.
-
Installation
-
Separate Installation Required: Piper must be installed separately from RealtimeTTS. Follow the Piper installation tutorial for Windows to set up Piper on your system.
-
Install RealtimeTTS:
pip install RealtimeTTS
Note: Unlike other engines, there is no need to install Piper support with
pip install RealtimeTTS[piper]
. The[piper]
option is not supported.
-
-
Usage
-
Configure PiperEngine:
- Specify the path to the Piper executable and the desired voice model using the
PiperVoice
andPiperEngine
classes. - Refer to the Piper test file for an example of how to set up and use PiperEngine in your projects.
- Specify the path to the Piper executable and the desired voice model using the
-
Example:
from RealtimeTTS import TextToAudioStream, PiperEngine, PiperVoice def dummy_generator(): yield "This is piper tts speaking." voice = PiperVoice( model_file="D:/Downloads/piper_windows_amd64/piper/en_US-kathleen-low.onnx", config_file="D:/Downloads/piper_windows_amd64/piper/en_US-kathleen-low.onnx.json", ) engine = PiperEngine( piper_path="D:/Downloads/piper_windows_amd64/piper/piper.exe", voice=voice, ) stream = TextToAudioStream(engine) stream.feed(dummy_generator()) stream.play()
-
Additional Information
-
Piper Resources:
- Installation Tutorial: Watch on YouTube
- Test File Example: piper_test.py
-
Support:
- If you have any issues or have questions about the new PiperEngine, please open an issue.
v0.4.21
RealtimeTTS v0.4.21 Release Notes
🚀 New Features
- update to latest versions of dependencies (stream2sentence, coqui-tts, elevenlabs, openai, edge-tts)
StyleTTS Engine
- Added seed. Added fix for a styletts2 problem causing noise to be generated with very short texts, especially when using embedding_scale values > 1
🛠 Bug Fixes
- Fixed a problem in stream2sentence causing minimum_sentence_length to not be respected
v0.4.20 🌿
RealtimeTTS v0.4.20 Release Notes
🚀 New Features
Azure Engine
- Added support for 48 kHz audio output in the Azure TTS engine for improved audio quality (and providing more flexibility in audio formats).
StyleTTS Engine
- introduced StyleTTSVoice for dynamic voice switching to allow transitions between multiple voice models
🛠 Bug Fixes
- Fixed incorrect voice initialization when switching between models in the StyleTTS engine.
- Fixed model configuration path issues during runtime when updating voice parameters.
v0.4.19
v0.4.17
v0.4.14
fixes #223
Enhancements to Sentence Processing
- Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
- Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering
force_first_fragment_after_words
, causing processing errors. - Increased the default
force_first_fragment_after_words
threshold from 15 to 30 for better fragment control.
v0.4.13
RealtimeTTS v0.4.13 Release Notes
🚀 New Features
EdgeEngine
- Introducing EdgeEngine, a free, extremely lightweight, and beginner-friendly engine.
- Designed for simplicity with no complex dependencies, making it ideal for lightweight projects or newcomers to TTS.
🛠 Bug Fixes
v0.4.11
v0.4.10
- new stream2sentence version 0.2.7
- bugfix for #5 (causing a whitespace between words to get lost sometimes)
- upgrade to latest NLTK and Stanza versions including new "punkt-tab" model
- allow offline environment for stanza
- adds support for async streams (preparations for async in RealtimeTTS)
- dependency upgrades to latest version (coqui tts 0.24.2 ➡️ 0.24.3, elevenlabs 1.11.0 ➡️ 1.12.1, openai 1.52.2 ➡️ 1.54.3)
- added load_balancing parameter to coqui engine
- if you have a fast machine with a realtime factor way lower than 1, we infer way faster then we need to
- this parameter allows you to infer with a rt factor closer to 1, so you will still have streaming voice inference BUT your GPU load goes down to the minimum that is needed to produce chunks in realtime
- if you do LLM inference in parallel this will be faster now because TTS takes less load