What worked for me to get coqui-ai tts working on Windows 11, how to set TTS server on startup, how to access the API from python code #3991

kelvinator2 · 2024-09-07T06:45:50Z

kelvinator2
Sep 7, 2024

Step by step of what worked for me to get coqui-ai tts working on Windows 11, how to set TTS server on startup, how to access the API from python code

Hopefully, some of this might help others avoid some of the long hours of hacking it took me - I really appreciate the bits and pieces others posted that helped. I would have loved to have come across this summary early on...

In a Windows terminal, enter "WSL" to get into linuxland. Then follow the instructs from this youtube to install docker, if not already there, and to download the coqui-ai/tts docker image:
https://www.youtube.com/watch?v=r8r1VFbhh1w
There are also docs on it:
https://coqui-tts.readthedocs.io/en/latest/docker_images.html#start-a-server
Once the image is downloaded, test it in the way he does in the video. Notice, he has to issue one instruction to start the image
running and put himself on a shell program inside it as "root" user, then inside the container, issue the command to start the server running.
We don' need dat! We want to make an image that is built to automatically run that server start command when it starts.
Now, to make an image with a custom starting point, apparently you first make your own dockerfile (a file named dockerfile with no extension in your working directory). That's an instruction file that tells the docker app how to build your own custom image when you give it the next build image command.

Here's my dockerfile:

FROM ghcr.io/coqui-ai/tts

# Set the working directory
# WORKDIR /app

# Copy any necessary files (if needed)
# COPY . .

# Override the entrypoint to start the server
ENTRYPOINT ["python3", "TTS/server/server.py", "--model_name", "tts_models/en/vctk/vits"]

Clearly, it could do a few other things, but all I needed was to give the image to start with and reset the entry point for running.

Build, Baby, Build! Enter "sudo docker build -t my-tts-server ." (don't forget the dot. I almost thought it was dirt on my screen)
Start the server. After testing, I set mine to be up all the time with this command: "sudo docker run --gpus all -it -p 5002:5002 --restart unless-stopped my-tts-server"
To test, though, first start it going without the --restart parameter. Go to a Windows browser and go to "localhost:5002". The interface should be there and a 100 or more voices you can type in text and here. On the server side, it shows the seconds and fractions it takes to do each text to speech conversion. As usual, you can break the running server in the window any time with a CMD-C interrupt.
For the server to run automatically each time you're on your computer, you have to set a task in the Task Scheduler for the desktop Docker to be started each time the system starts up, or you log on, etc.
If you want to call the https server a localhost:5002 (or wherever you set it) from inside a python program running in Windows (or, I believe the WSL environment, tho I haven't tested), this is my test code that the AI and I fiddled with until it worked. After some failures getting the parms right, I looked at the java script inside the html of the interface page the server puts up at that address, gave it to the AI to learn from and fix our python call and it worked after minor fiddling. Here's the final version of the little ttsrun.py we wrote to test and finally successfully make the tts docker server call:

import requests

# Define the API endpoint
url = "http://localhost:5002/api/tts"

# Define the parameters for the GET request
params = {
    "text": "Hello, this is a test.",  # Text to be synthesized
    "speaker_id": "p263",  # Specify a speaker ID from the html interface drop-down
    "style_wav": "",   # Optional: specify a style WAV if needed
    "language_id": "EN"  # Optiona(?)l: specify a language ID if needed
}

# Send a GET request to the TTS API
try:
    response = requests.get(url, params=params)

    # Check if the request was successful
    if response.status_code == 200:
        # Save the audio content to a file
        with open("output.wav", "wb") as f:
            f.write(response.content)
        print("Audio file saved as output.wav")
    else:
        print(f"Failed to get audio. Status code: {response.status_code}")
        print("Response:", response.text)
except requests.exceptions.RequestException as e:
    print("An error occurred:", e)

Enjoy! I'm loving hearing all the different, trained voices - they aren't perfect but you'll notice many have subtler qualities that usual synthetic voices - convey a little of the person, I'd say. And it's great you can train your own!! I'm going to wait to delve into that one though, given all the hours I just burned, with the help of my sidekick, llama 3.1 70B, trying to hack our way through installing coqui-ai tts on Windows, (you can't get there from here dependency conflicts, etc) and various other permutations of hell-realms before getting it rolling and happy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What worked for me to get coqui-ai tts working on Windows 11, how to set TTS server on startup, how to access the API from python code #3991

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

What worked for me to get coqui-ai tts working on Windows 11, how to set TTS server on startup, how to access the API from python code #3991

kelvinator2 Sep 7, 2024

Replies: 0 comments

kelvinator2
Sep 7, 2024