etzinis / sudo_rm_rf

Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of separating sources from mixtures.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Separation of music and speech

0xBEEEF opened this issue · comments

Here is a question about the model mentioned. Up to now it was mainly about the separation of speech and environmental sounds.

Could the model also be used to separate speech and music? Or is there any experience with how it works with the MUSDB18 dataset, for example? I am not interested in separating all 4 components. It would be enough to separate speech and the rest.

I am curious to learn more about it. And a follow-up question in a moment:

What about the training duration? Let's assume I have a normal consumer graphics card, e.g. the GeForce 2080. Could I use it to train, and if so, how long would that take? Is there any experience here?

Of course it can be used any separation task. If you want to extract the voice from the song just use two sources: the singing voice and the rest as trainign targets.

The training duration of what? Sudo rm -rf is pretty fast you can try to fork the repo and use it in order to see. In a few hours with a single GPU you will have very decent results.

Btw this is not an issue. If you want to ask about things like that please reach me via e-mail and do not open an issue ticket.