For ERes2NetV2 performance on short-duration wavs

Question

For ERes2NetV2 performance on short-duration wavs

JiJiJiang opened this issue 4 months ago · comments

Thank you for your well design of the ERes2Net model and make it open-source.

As you mention, the V2 version of ERes2Net improves the short-duration feature extraction capability of ERes2Net.
Are there any experimental results that support this conclusion?

If so, the ERes2Net model could be better for diarization task using the traditional clustering-based system. In this case, we usually extract speaker embeddings using a sliding-window, e.g., 1.5s.

Chen Yafeng · Answer 1 · Wed May 22 2024 16:08:21 GMT+0800 (China Standard Time)

Our experiments have thoroughly validated this conclusion, and the paper will be open-sourced in June. Thank you for your interest. You are invited to join the 3D-Speaker technical sharing session tonight at 8 pm. You can access the meeting through this link: https://mp.weixin.qq.com/s/uwvVUIDb0eaAHlfWiuwEoQ.

Hongji Wang · Answer 2 · Wed May 22 2024 16:13:08 GMT+0800 (China Standard Time)

OK I see, thank you for your answer. Looking forward to your talk and paper.