Grad-TTS in multispeaker setting

Question

Grad-TTS in multispeaker setting

ajinkyakulkarni14 opened this issue 3 years ago · comments

Thank you for the releasing original implementation of Grad-TTS. I would like to know if a multispeaker setting is available or planned for release.

I am implementing a multispeaker setting using this repo. Will the maintainer of this repo be interested in discussing or providing feedback on multispeaker Grad-TTS implementation?

Regards
Ajinkya

Ivan Vovk · Answer 1 · Wed Aug 04 2021 23:41:23 GMT+0800 (China Standard Time)

@ajinkyakulkarni14 Hey! Sorry for the late response. Multispeaker is possible for Grad-TTS and we are discussing the opportunity for releasing it also (we verified it on Libri-TTS). Actually, if you don't want to wait, you can modify the code by yourself by introducing additional condition to the model with classical learnable speaker embeddings. The only thing you should notice is that encoder has the loss on mel-spectrogram also, thus you should condition the encoder on speaker embedding as well as the decoder. Condition can be made by simple broadcasting of speaker embedding along all timesteps and channel-wise concatenation with the other input. In our solution, we conditioned both encoder and decoder.

Ajinkya Kulkarni · Answer 2 · Mon Aug 16 2021 23:52:37 GMT+0800 (China Standard Time)

Hello @ivanvovk . I have added the speaker encoder module similar way of GLOW-TTS implementation. Thank you for suggestion.