Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similar Latencies in Text-to-Speech for Streaming and Standard Requests #42

Open
Kaanayden opened this issue May 26, 2024 · 2 comments
Open

Comments

@Kaanayden
Copy link

Kaanayden commented May 26, 2024

Hello ElevenLabs Team,

I am trying to develop a text to speech Discord bot. To achieve shorter latencies, I have tried the streaming on voice generation, but I'm encountering nearly identical latencies with both settings.
I am using v0.5.0.

With streaming:

const audioStream = await elevenlabs.generate({
    stream: true,
    voice: "Josh",
    text: text,
    model_id: "eleven_multilingual_v2",
    optimize_streaming_latency: 2,
    voice_settings: {
        stability: 0.5,
        similarity_boost: 0.8,
        style: 0.0,
        use_speaker_boost: true,
    }
});
/*
Output times (in milliseconds): 
2143
2148
2142
3678
*/

Without streaming:

const audioStream = await elevenlabs.generate({
    stream: false,
    voice: "Josh",
    text: text,
    model_id: "eleven_multilingual_v2",
    optimize_streaming_latency: 2,
    voice_settings: {
        stability: 0.5,
        similarity_boost: 0.8,
        style: 0.0,
        use_speaker_boost: true,
    }
});
/*
Output times (in milliseconds):
2145
2222
2241
2268
*/

Is it normal or is there an error on the code? Is it expected for the streaming mode to have similar or identical latencies compared to non-streaming mode under these settings? I have also tried using the direct API streaming endpoint (POST request in the https://elevenlabs.io/docs/api-reference/streaming) and got similar results.

I appreciate any guidance or insights you can provide!

@Kaanayden Kaanayden changed the title Streaming results similar response times with standard request Similar Latencies in Text-to-Speech for Streaming and Standard Requests May 26, 2024
@Sheldenshi
Copy link

for streaming, you should measure the first byte arrive time

@Sheldenshi
Copy link

const before = Date.now()
let firstChunk = false
for await (const chunk of audioStream) {
	if (!firstChunk) {
		console.log(Date.now() - before)
                firstChunk = true
	}
}```

something like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants