Why chunk text and audio?

Question

Why chunk text and audio?

infinityp913 opened this issue 10 months ago · comments

Hi @GRVYDEV, I was curious about why you decided to send text from the TTT to the TTS in chunks, and hence audio chunks from the TTS to the browser client.
Why not get the entire text from TTT --> TTS and the entire audio from TTS --> browser client? Is it to account for long texts that might need to be synthesized by SATURDAY, hitting some bottleneck somewhere in the pipeline?

Or is it to minimize latency since I guess with chunked text and audio, we could have SATURDAY speaking as soon as we have text generated by the TTT and not have to wait for the entire piece of text to be generated.

Thanks!

Ananth commented 8 months ago

Thank you

AT · Answer 1 · Sat Nov 18 2023 00:24:47 GMT+0800 (China Standard Time)

@infinityp913 it'll reduce process time and make everything smoother