about speaker

Question

about speaker

assyoucan opened this issue 6 years ago · comments

I would like to ask, if I use A's data to train the network, after training, the input sound becomes B, then the effect is good？ or need to use B data to train again.

Oytun Turk · Answer 1 · Tue Jan 08 2019 04:07:21 GMT+0800 (China Standard Time)

It works reasonably well according to my limited tests. Quality and similarity to speaker B might be a bit off. Speaker-independent training recipes might work better or could be more robust to speaker A/B differences.

Tomoki Hayashi · Answer 2 · Mon Apr 08 2019 21:27:33 GMT+0800 (China Standard Time)

In the case of voice conversion, we usually train a multi-speaker model and then fine-tune the model using small amount of single speaker data.