huawei-noah / Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Grad-TTS in multispeaker setting

ajinkyakulkarni14 opened this issue · comments

Thank you for the releasing original implementation of Grad-TTS. I would like to know if a multispeaker setting is available or planned for release.

I am implementing a multispeaker setting using this repo. Will the maintainer of this repo be interested in discussing or providing feedback on multispeaker Grad-TTS implementation?

Regards
Ajinkya

@ajinkyakulkarni14 Hey! Sorry for the late response. Multispeaker is possible for Grad-TTS and we are discussing the opportunity for releasing it also (we verified it on Libri-TTS). Actually, if you don't want to wait, you can modify the code by yourself by introducing additional condition to the model with classical learnable speaker embeddings. The only thing you should notice is that encoder has the loss on mel-spectrogram also, thus you should condition the encoder on speaker embedding as well as the decoder. Condition can be made by simple broadcasting of speaker embedding along all timesteps and channel-wise concatenation with the other input. In our solution, we conditioned both encoder and decoder.

Hello @ivanvovk . I have added the speaker encoder module similar way of GLOW-TTS implementation. Thank you for suggestion.