Cannot reproduce reported SDR & retrain the speaker embedding

Question

Cannot reproduce reported SDR & retrain the speaker embedding

nnbtam99 opened this issue 2 years ago · comments

Hello, I have two questions about the implementation.

I cannot reproduce the results reported in the README.
I have trained for around > 400k steps on Librispeech 360h + 100h clean dataset, using the embedder provided in this repo.
However, I can only obtain up to a maximum SDR of 5.5.

To obtain data from the Librispeech 360h + 100h, I generate the mixed audios for 360h and 100h separately, then add them together in another folder. Is this the right way when I want to use more data to train the voice filter module?

I got worse results when retraining the speaker embedding
I retrained the embedder using the following repo: Speaker verification on 3 datasets: Librispeech, VoxCeleb1, VoxCeleb2.

Theoretically, I expect the voice filter module will benefit from the embedder trained on more data, but the results got even worse. Can you share how you train this embedder?

Thank you in advance!