mossFormer2 & sepTDA models

Question

mossFormer2 & sepTDA models

jeromew opened this issue 3 months ago · comments

🚀 Feature

I suggest the addition of the mossFormer2 and sepTDA models

Motivation

The 2 models seem to be improving the SOTA on the speaker separation task.
cf https://paperswithcode.com/sota/speech-separation-on-wsj0-2mix

sepTDA :

mossformer2:

What you'd like

A implementation of the models in asteroid with a running pretrained model for inference

Alternatives

I managed to have mossformer2 inference work via https://modelscope.cn/models/iic/speech_mossformer2_separation_temporal_8k/summary

Additional context

I try to separate sources with an unknown number of speakers on a difficult audio track (opera music + many speakers with a lot of overlapping)

Pariente Manuel · Answer 1 · Mon Apr 01 2024 03:05:12 GMT+0800 (China Standard Time)

Hello,

Thank you for the issue. Do you want to contribute these models ? We'll welcome them for sure !

jeromew · Answer 2 · Fri Apr 19 2024 22:48:01 GMT+0800 (China Standard Time)

Hello, thanks for your response.

I am afraid I am too far from this field at the moment to be able to contribute models. I was just playing around with source separation models to try and solve a CTF puzzle involving a difficult to parse audio mix. I will join the slack channel if things change.

I am closing this issue as I am sure you are not missing models to integrate into asteroid and that those 2 will re-appear if they are key to the field. In the meantime you will have one less issue in github !