about speaker
assyoucan opened this issue · comments
assyoucan commented
I would like to ask, if I use A's data to train the network, after training, the input sound becomes B, then the effect is good? or need to use B data to train again.
Oytun Turk commented
It works reasonably well according to my limited tests. Quality and similarity to speaker B might be a bit off. Speaker-independent training recipes might work better or could be more robust to speaker A/B differences.
Tomoki Hayashi commented
In the case of voice conversion, we usually train a multi-speaker model and then fine-tune the model using small amount of single speaker data.