kuleshov / audio-super-res

Audio super resolution using neural networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MULT-SPEAKER HELP

Ashwin-Ramesh2607 opened this issue · comments

Hi,
I want to know some details about the multi-speaker training. In specific, what changes are present in the architecture/data preparation for single vs multi-speaker dataset. Are you merging multiple speakers in one audio clip or not. Please explain the steps for training a multi-speaker custom dataset.

@Ashwin-Ramesh2607

There are no architectural differences involved in training a multispeaker model instead of a singlespeaker one. The difference is purely in the dataset. For the singlespeaker task, all of the patches of audio that we train on come from the same speaker. For the multispeaker task, these patches come from many speakers.

To train on a custom dataset, you'll need to create the h5 files that run.py expects. You can use prep_vtck.py as a guide.