etzinis / sudo_rm_rf

Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of separating sources from mixtures.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seperation audio reversed in assignment

under-funk opened this issue · comments

While working with WHAM dataset, in two seperate experiements the results are reversed for the speakers. Meaning Estimate Audio 1 is assigned to True Audio 2 while Estimate Audio 2 is assigned to True Audio 1. I attached an image for you to see the waveforms and how they are switched. I will do another experiment these days with other parameters and report if the issue persists

Command line: python run_improved_sudormrf.py --train WHAM --val WHAM --test WHAM --train_val WHAM --separation_task sep_clean --n_train 20000 --n_test 3000 --n_val 3000 --n_train_val 3000 --out_channels 256 --num_blocks 16 -cad 0 1 --n_jobs 8 --divide_lr_by 3. --upsampling_depth 5 --patience 49 -fs 8000 -tags sudo_rm_rf_16 --project_name sudormrf_wham --zero_pad --clip_grad_norm 5.0 --model_type relu --n_epochs 100 -bs 7

Screen Shot 2022-05-09 at 08 52 09

Pardon my brevity by the way, I think this is a great implementation and really good work! Thank you very much for making this repo

Hey thanks for the kind words, so in the problem of source separation, there is permutation invariance at the output (that means that speaker 1 could be assigned at the first or the second slot without change in our metrics). You are right, I did not compute the best permutation to assign the estimated sources back to their corresponding reference signals but this does not matter in a source separation setup. You can do it easily by finding which permutation gives you the maximum SI-SDR.

Thanks for the response etzinis, I was wondering if in a enh_single setup this would be a problem. I am guessing the maximum SI-SDR come from the correct estimate and it will be chosen automatically? I am interested in source seperation for enhancing one speaker but for 'in the wild' recordings there is a selection problem, if there are several speakers it is not easy to determine which one is the 'target' speaker, I havent started with the enh_single setup but was toying around with different parameters first to get familiar with your code and I just wanted to point it out.