why so many noise?

Question

why so many noise?

AIHHU opened this issue 2 years ago · comments

I tried improved_rm_rf on wham clean speech separation and the separation result had so many noise, And its Si-SDR is 15.0

Efthymios Tzinis · Answer 1 · Sun Mar 27 2022 16:14:59 GMT+0800 (China Standard Time)

I don't know which version you tried, I am sure you are doing something wrong though (maybe you do not normalize the waveform?). As specified in the README, you can see how to use the pre-trained models here: https://github.com/etzinis/sudo_rm_rf/blob/master/sudo_rm_rf/notebooks/sudormrf_how_to_use.ipynb

Efthymios Tzinis · Answer 2 · Sun Mar 27 2022 16:17:38 GMT+0800 (China Standard Time)

Moreover, SI-SDR at 15 dB corresponds no noise at all, so please check again your code.

hhu小菜鸡 · Answer 3 · Sun Mar 27 2022 16:19:34 GMT+0800 (China Standard Time)

Thank you sir, I tried WHAM clean separation experiment as in your readme doc and did the separation as in your notebook. Normalization is also tried.

Efthymios Tzinis · Answer 4 · Sun Mar 27 2022 16:21:46 GMT+0800 (China Standard Time)

Maybe you are getting the wrong file or something, in any case, it's almost impossible to get a really noisy file with 15dB SI-SDR. Except of the case where the noisy file is your reference signal :P

hhu小菜鸡 · Answer 5 · Sun Mar 27 2022 16:23:01 GMT+0800 (China Standard Time)

The only difference is that i use sf.read instead of torchaudio.read and use keep_dims intead of keep_dim

Efthymios Tzinis · Answer 6 · Sun Mar 27 2022 16:24:57 GMT+0800 (China Standard Time)

It's impossible to debug your code without looking at it. Please use what I have in my notebook and I am sure it will run smoothly.

hhu小菜鸡 · Answer 7 · Sun Mar 27 2022 16:27:20 GMT+0800 (China Standard Time)

Thanks. i'll try it soon. I can ensure that i don't change your experiment code and the dataset is the wham dataset. That's true.

hhu小菜鸡 · Answer 8 · Sun Mar 27 2022 16:46:52 GMT+0800 (China Standard Time)

 That's my inference code sir. It's all same to you. Why there also exits so many noise. The returned test Si-SDR index is 15.0.  The model is trained on 16khz,.
  esti_utt, _ = torchaudio.load(os.path.join(mix_file_path, file_id))
        
        input_mix_std = esti_utt.std(-1, keepdim=True)
        input_mix_mean = esti_utt.mean(-1, keepdim=True)
        input_mix = (esti_utt - input_mix_mean) / (input_mix_std + 1e-9)

        rec_sources_wavs = model_separation(input_mix.unsqueeze(1))


        #rec_sources_wavs = (rec_sources_wavs * input_mix_std) + input_mix_mean
    
        est_waveform_1 = rec_sources_wavs[0, 0].detach().numpy()
        est_waveform_2 = rec_sources_wavs[0, 1].detach().numpy()
        
        #print(rec_sources_wavs.shape)
        
        sf.write(os.path.join(esti_file_path, '1'+file_id), est_waveform_1, args.fs)
        sf.write(os.path.join(esti_file_path, '2'+file_id), est_waveform_2, args.fs)

hhu小菜鸡 · Answer 9 · Sun Mar 27 2022 16:48:52 GMT+0800 (China Standard Time)

The # in the line
#rec_sources_wavs = (rec_sources_wavs * input_mix_std) + input_mix_mean is removed when being tested

Efthymios Tzinis · Answer 10 · Sun Mar 27 2022 16:55:27 GMT+0800 (China Standard Time)

You are still giving me incomplete code snippets - tbh I don't see the reason for me to debug your code if you have a notebook which is bug-free. Are you pointing to files which are 8kHz ? Rescaling is important for capturing the appropriate gain of the sources.

hhu小菜鸡 · Answer 11 · Sun Mar 27 2022 17:00:54 GMT+0800 (China Standard Time)

nope, sir, I have a question: if i use the 16khz dataset to train and change the --fs in the command line. Will it train on the 16khz dataset? or In your code, it will resample to 8khz?

Efthymios Tzinis · Answer 12 · Sun Mar 27 2022 17:03:18 GMT+0800 (China Standard Time)

those are 8khz models, you have to downsample your files to 8kHz first and then process them.

hhu小菜鸡 · Answer 13 · Sun Mar 27 2022 17:05:19 GMT+0800 (China Standard Time)

sorry to hear that, lol, 16khz model is required

Efthymios Tzinis · Answer 14 · Mon Mar 28 2022 01:04:32 GMT+0800 (China Standard Time)

You mean is required for your application? I was thinking that maybe it would also be a good idea to run a couple of 16 kHz models.