facebookresearch / denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem about using denoiser as preprocess

lfgogogo opened this issue · comments

Hi,thanks for the great job,denoiser can really denoise.
But when i use it as the preprocess of asr ,the result is worse,i use denoiser as bellow:

from denoiser.demucs import Demucs
import torch
import time
import torchaudio
from scipy.io.wavfile import write

ROOT = "https://dl.fbaipublicfiles.com/adiyoss/denoiser/"
DNS_48_URL = ROOT + "dns48-11decc9d8e3f0998.th"
DNS_64_URL = ROOT + "dns64-a7761ff99a7d5bb6.th"
MASTER_64_URL = ROOT + "master64-8a5dfb4bb92753dd.th"

def _demucs(pretrained, url, **kwargs):
    model = Demucs(**kwargs, sample_rate=16_000)
    if pretrained:
        state_dict = torch.hub.load_state_dict_from_url(url, map_location='cpu')
        model.load_state_dict(state_dict)
    return model

def dns48(pretrained=True):
    return _demucs(pretrained, DNS_48_URL, hidden=48)

def dns64(pretrained=True):
    return _demucs(pretrained, DNS_64_URL, hidden=64)

def master64(pretrained=True):
    return _demucs(pretrained, MASTER_64_URL, hidden=64)

if __name__=='__main__':
    model = master64().cuda().eval()
    if model is None:
        print('model is none')
    x,_=torchaudio.load(r'noise.wav')
    out=model(x.cuda())
    #ASR
    write(r'denoise.wav', 16000,out[0][0].cpu().detach().numpy())

Do i miss any important process?Or someone has a similar problem as me?Hope for answer.

can you make sure that the sample rate of your wav files are indeed sampled at 16000 Hz ? that could be one issue.

Of course,they are 16000 Hz.

if the noise level is not super high, it is possible that the artefacts from the separation are actually hurting the ASR performance. for noisy samples though i would expect it to work. also it depends if your ASR model was trained on noisy data or not.