MULT-SPEAKER HELP

Question

MULT-SPEAKER HELP

Ashwin-Ramesh2607 opened this issue 4 years ago · comments

Hi,
I want to know some details about the multi-speaker training. In specific, what changes are present in the architecture/data preparation for single vs multi-speaker dataset. Are you merging multiple speakers in one audio clip or not. Please explain the steps for training a multi-speaker custom dataset.

Sawyer Birnbaum · Answer 1 · Wed Apr 07 2021 11:08:27 GMT+0800 (China Standard Time)

@Ashwin-Ramesh2607

There are no architectural differences involved in training a multispeaker model instead of a singlespeaker one. The difference is purely in the dataset. For the singlespeaker task, all of the patches of audio that we train on come from the same speaker. For the multispeaker task, these patches come from many speakers.

To train on a custom dataset, you'll need to create the h5 files that run.py expects. You can use prep_vtck.py as a guide.