Optimize response streaming for lower latency

Question

Optimize response streaming for lower latency

Shackless opened this issue a year ago · comments

Quoted from Discord user meenie:

To help with the pretty large delay, it would be ideal to stream the reply from GPT-4 into TTS and then start streaming the TTS audio as it's being generated. I haven't jumped into the codebase yet to verify. If you aren't doing that, I think I might be able to help.

We are already utilising streams but should check if we really do it everywhere we can. The biggest delay seems to come from TTS as there is a noticeable delay before the voice output starts. Double-check if we're doing this right.

Cody Lundquist · Answer 1 · Thu Nov 23 2023 05:32:57 GMT+0800 (China Standard Time)

I put some work into this last night and found that there is a bug with streaming TTS responses in that the feature doesn't actually exist lol. I've created an issue here to keep track of when OpenAI fixes it.

My findings are that when OpenAI eventually fixes this bug, we need to update the AudioPlayer to accept a file path that then utilizes buffering and threads to chunk up the audio file into smaller parts and then use PyDub for playback. I ended up getting the code to work, but it's no faster than what we have right now because of the OpenAI issue mentioned above.

Simon Hopstätter · Answer 2 · Wed May 29 2024 17:03:27 GMT+0800 (China Standard Time)

closed in favor of https://wingman-ai.canny.io/feature-requests/p/text-completion-streaming - we'll use canny for (votable) feature requests and GH issues for bugs.