Problem about using denoiser as preprocess
lfgogogo opened this issue · comments
Hi,thanks for the great job,denoiser can really denoise.
But when i use it as the preprocess of asr ,the result is worse,i use denoiser as bellow:
from denoiser.demucs import Demucs
import torch
import time
import torchaudio
from scipy.io.wavfile import write
ROOT = "https://dl.fbaipublicfiles.com/adiyoss/denoiser/"
DNS_48_URL = ROOT + "dns48-11decc9d8e3f0998.th"
DNS_64_URL = ROOT + "dns64-a7761ff99a7d5bb6.th"
MASTER_64_URL = ROOT + "master64-8a5dfb4bb92753dd.th"
def _demucs(pretrained, url, **kwargs):
model = Demucs(**kwargs, sample_rate=16_000)
if pretrained:
state_dict = torch.hub.load_state_dict_from_url(url, map_location='cpu')
model.load_state_dict(state_dict)
return model
def dns48(pretrained=True):
return _demucs(pretrained, DNS_48_URL, hidden=48)
def dns64(pretrained=True):
return _demucs(pretrained, DNS_64_URL, hidden=64)
def master64(pretrained=True):
return _demucs(pretrained, MASTER_64_URL, hidden=64)
if __name__=='__main__':
model = master64().cuda().eval()
if model is None:
print('model is none')
x,_=torchaudio.load(r'noise.wav')
out=model(x.cuda())
#ASR
write(r'denoise.wav', 16000,out[0][0].cpu().detach().numpy())
Do i miss any important process?Or someone has a similar problem as me?Hope for answer.
can you make sure that the sample rate of your wav files are indeed sampled at 16000 Hz ? that could be one issue.
Of course,they are 16000 Hz.
if the noise level is not super high, it is possible that the artefacts from the separation are actually hurting the ASR performance. for noisy samples though i would expect it to work. also it depends if your ASR model was trained on noisy data or not.