Issues with preserving the speaker identity

Question

Issues with preserving the speaker identity

justinjohn0306 opened this issue a year ago · comments

Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.

adelacvg · Answer 1 · Thu Aug 03 2023 16:30:28 GMT+0800 (China Standard Time)

The pre-trained model is trained on VCTK dataset. It is not large enough and may not works well on data in the wild. I am working on improving the generalization of the model by modifying the network structure. You can fine-tune or train the model by yourself for better results.

Justin John · Answer 2 · Thu Aug 03 2023 18:25:34 GMT+0800 (China Standard Time)

alright, gotcha :)

Rishikesh (ऋषिकेश) · Answer 3 · Tue Apr 30 2024 16:08:56 GMT+0800 (China Standard Time)

@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.