Separation of music and speech

Question

Separation of music and speech

0xBEEEF opened this issue 4 years ago · comments

Here is a question about the model mentioned. Up to now it was mainly about the separation of speech and environmental sounds.

Could the model also be used to separate speech and music? Or is there any experience with how it works with the MUSDB18 dataset, for example? I am not interested in separating all 4 components. It would be enough to separate speech and the rest.

I am curious to learn more about it. And a follow-up question in a moment:

What about the training duration? Let's assume I have a normal consumer graphics card, e.g. the GeForce 2080. Could I use it to train, and if so, how long would that take? Is there any experience here?

Efthymios Tzinis · Answer 1 · Sat Sep 19 2020 01:52:24 GMT+0800 (China Standard Time)

Of course it can be used any separation task. If you want to extract the voice from the song just use two sources: the singing voice and the rest as trainign targets.

The training duration of what? Sudo rm -rf is pretty fast you can try to fork the repo and use it in order to see. In a few hours with a single GPU you will have very decent results.

Btw this is not an issue. If you want to ask about things like that please reach me via e-mail and do not open an issue ticket.