adelacvg / NS2VC

Unofficial implementation of NaturalSpeech2 for Voice Conversion and Text to Speech

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues with preserving the speaker identity

justinjohn0306 opened this issue · comments

Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.

The pre-trained model is trained on VCTK dataset. It is not large enough and may not works well on data in the wild. I am working on improving the generalization of the model by modifying the network structure. You can fine-tune or train the model by yourself for better results.

alright, gotcha :)

@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.