Using Elevenlabs Streaming Endpoint for faster T2S responses...

Question

Using Elevenlabs Streaming Endpoint for faster T2S responses...

pedroespecial101 opened this issue 10 months ago · comments

I'd like to try Chat with GPT out with the new voice streaming feature from Elevenlabs. (https://docs.elevenlabs.io/api-reference/text-to-speech-stream).

I can see at the moment that it is just using the standard endpoint (https://api.elevenlabs.io/v1/text-to-speech{voice_id}) rather than (https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream).

I appreciate this will be more complicated than just changing the endpoint, as we would want to save the voice snippet for playback? As you can tell, I'm not a seasoned programmer, but I do like to have a go - especially with GPT4 assistance!

I suppose I'm wondering if a) Anyone has already done this - in which case I won't need to! or b) If anyone has any ideas/outlines on how broadly I might attempt this?

I'm also wondering if it would be possible to create a virtual API endpoint that behaves like a non-streaming endpoint but internally uses the streaming endpoint to fetch the data. This can act as an adapter or proxy, essentially gathering all the streamed chunks and feeding them to the app in smaller chunks than it would normally get?

Does anyone have any thoughts on this or is also interested in speeding up the Elevenlabs response?

Pedro :-)