[VSim ECAPA] What $MODEL_PATH should be used when using the ECAPA model for speaker similarity evaluation?

Question

[VSim ECAPA] What $MODEL_PATH should be used when using the ECAPA model for speaker similarity evaluation?

Poeroz opened this issue 7 months ago · comments

Hi, thanks for your great work! I would like to use VSim for speaker similarity evaluation. From the document, I see that I should use "wavlm_large_fintune.pth" model when "model_type=valle". I'm not sure whether model path should be used when I want to use "model_type=ecapa"? Thanks!

David Dale · Answer 1 · Mon Jan 08 2024 18:48:16 GMT+0800 (China Standard Time)

The recommended setting is using model_type=valle and wavlm_large_fintune.pth.
This is the setting that we ended up using in the Seamless paper.
The ECAPA architecture is supported only for the sake of reproducibility of some our preliminary experiments that were not published (or for the unlikely case if you train your own ECAPA speech encoder).

Qingkai Fang · Answer 2 · Mon Jan 08 2024 18:53:07 GMT+0800 (China Standard Time)

Thanks for your reply!