etzinis / sudo_rm_rf

Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of separating sources from mixtures.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Notebooks: sudormrf_extract_sep_perf_metrics_example.ipynb

ONEISALL-h opened this issue · comments

Your work is very good. I have done some practical application tests. I have to say that sudo rm -rf model is very effective. However, I would like to ask whether there is an error in the input_sdr, input_sar, or other input_XX value output through pprint (results_dic) in notebooks: sudormrf_extract_sep_perf_metrics_example.ipynb.

Hey, thanks for the kind words! Indeed sudo rm -rf does what has promised to do. Do you want to share with me exactly the output that you think is wrong and the corresponding lines of code? These metrics are the ones automatically extracted from the asteroid metric library so I doubt that they have a bug.

When I tested on the wham! dataset, I found that there may be a problem with the value of input_sdr, input_ sar, or other input_XX , but there is no problem in sudormrf_how_to_use.ipynb. After checking, I found that in line 8 of the second cells in sudormrf_extract_sep_perf_metrics_example.ipynb:
whamrexcl_test_file_names = [os.path.join(whamr_test_folder_path, 'mix_both_reverb',name) for name in wsj02mix_test_file_names]
should be changed to
whamrexcl_test_file_names = [os.path.join(whamr_test_folder_path, 'mix_both',name) for name in whamrexcl_test_file_names]
Although this will not report an error, it will lead to that input .wav files cannot correspond to each other when calculating indicators such as SDR.

Oh sorry about that but the pre-trained models are for whamR! And wsj02mix. So if you try wham! these models will fail. Now that I am thinking of it I can train a couple of models for wham! just for fun 😊

The whamR! pre-trained models are trained with reverberant data so it is almost sure that if you test it with anechoic data you will get a performance drop.

I don't think what you are saying is correct, whamrexcl_test_file_names should contain the files for WHAMR! which are noisy AND reverberant. If you ls under whamr/wav8k/min/tt:

you will see that there are these files: mix_both_anechoic mix_both_reverb mix_clean_anechoic mix_clean_reverb mix_single_anechoic mix_single_reverb noise s1_anechoic s1_reverb s2_anechoic s2_reverb

mix_both_anechoic => WSJ0-2mix
mix_clean_anechoic => WHAM!
mix_both_reverb => WHAMR!

you can also check WHAMR!'s README for the exact naming and its meaning:

6. mix_single_anechoic: for speech enhancement, contains mixture of s1_anechoic and noise

7. mix_clean_anechoic: clean speech separation for two speakers, contains mixture of s1_anechoic and s2_anechoic.  The relative levels between speakers should match the original wsj0-2mix dataset, but the overall level of the mix will be different.

8. mix_both_anechoic: contains mixtures of s1_anechoic, s2_anechoic, and noise

9. mix_single_reverb: for speech enhancement, contains mixture of s1_reverb and noise

10. mix_clean_reverb: clean speech separation for two reverberant speakers, contains a mixture of s1_reverb and s2_reverb.  The relative levels between speakers should match the original wsj0-2mix dataset, but the overall level of the mix will be different.

11. mix_both_reverb: contains mixtures of s1_reverb, s2_reverb, and noise