facebookresearch / denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I can not reproduce the result that reported in the paper

KhanhNguyen4999 opened this issue · comments

I trained model using script launch_valentini.sh on dataset valentini 2017, with spk287 and spk286 in testset (resample from 48k to 16k using sox). But I got pesq=2.62 and stoi=0.92 after 400 epoch, the result is very smaller than paper's report.

In the paper reported that noisy data has pesq=1.97 and stoi=91.5, but I recalculated using code in this repo, I modified in denoiser/evaluate.py line 94(run with cpu). Specifically, if I want to calculate pesq of clean data, in line 94, I replace "estimate" by "clean", otherwise replace by "noisy":

  • noisy data: pesq=1.5, stoi=0.84
  • clean data: 4.64, stoi=1

What wrongs in this result? how can I reproduce correctly? Please help me!

Hope to hear from you soon!, I am still stuck here

Have you tried using the pretrained model on valentini and compute the PESQ and STOI on your dataset ? This would show if there is a mismatch between your data and ours.

Yes, I did, but pesq and stoi on dns48 pretrain model was very bad. Pesq=2.12 and Stoi=0.89

can you try with the valentini pretrained model

yep, I have tried master64 pretrain model, and gain pesq=2.69, stoi=0.928. Is there anything wrong?

Can you try with the model --valentini_nc, this one is trained only on valentini.

I also tried using valentini_nc pretrain model but pesq and stoi didn't change:
-master64: pesq=2.6966, stoi=0.9281
-valentini_nc: pesq=2.6977, stoi=0.9287

with the way I calculate pesq and stoi for the noisy audio in the first comment, let me know how do you calculate pesq and stoi for noisy audio, please? Because I see a gap here

Hi @KhanhNguyen4999,
This is strange. One reason for the gap might be a change in the valentini dataset. I saw there is a newer version of VCTK dataset, which is the basis of valentini.
However, the drop in performance should not be big and should only observed in the pretrained model (I got 0.94 stoi and 2.91 pesq). When training from scratch I got stoi 95 and 2.95 pesq