Seperation audio reversed in assignment

Question

Seperation audio reversed in assignment

under-funk opened this issue 2 years ago · comments

While working with WHAM dataset, in two seperate experiements the results are reversed for the speakers. Meaning Estimate Audio 1 is assigned to True Audio 2 while Estimate Audio 2 is assigned to True Audio 1. I attached an image for you to see the waveforms and how they are switched. I will do another experiment these days with other parameters and report if the issue persists

Command line: python run_improved_sudormrf.py --train WHAM --val WHAM --test WHAM --train_val WHAM --separation_task sep_clean --n_train 20000 --n_test 3000 --n_val 3000 --n_train_val 3000 --out_channels 256 --num_blocks 16 -cad 0 1 --n_jobs 8 --divide_lr_by 3. --upsampling_depth 5 --patience 49 -fs 8000 -tags sudo_rm_rf_16 --project_name sudormrf_wham --zero_pad --clip_grad_norm 5.0 --model_type relu --n_epochs 100 -bs 7

under-funk · Answer 1 · Mon May 09 2022 15:00:34 GMT+0800 (China Standard Time)

Pardon my brevity by the way, I think this is a great implementation and really good work! Thank you very much for making this repo

Efthymios Tzinis · Answer 2 · Wed May 11 2022 07:51:04 GMT+0800 (China Standard Time)

Hey thanks for the kind words, so in the problem of source separation, there is permutation invariance at the output (that means that speaker 1 could be assigned at the first or the second slot without change in our metrics). You are right, I did not compute the best permutation to assign the estimated sources back to their corresponding reference signals but this does not matter in a source separation setup. You can do it easily by finding which permutation gives you the maximum SI-SDR.

under-funk · Answer 3 · Wed May 11 2022 13:06:40 GMT+0800 (China Standard Time)

Thanks for the response etzinis, I was wondering if in a enh_single setup this would be a problem. I am guessing the maximum SI-SDR come from the correct estimate and it will be chosen automatically? I am interested in source seperation for enhancing one speaker but for 'in the wild' recordings there is a selection problem, if there are several speakers it is not easy to determine which one is the 'target' speaker, I havent started with the enh_single setup but was toying around with different parameters first to get familiar with your code and I just wanted to point it out.