sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multichannel input does not work

splinter21 opened this issue · comments

Traceback (most recent call last):
File "enhancement.py", line 74, in
write(join(target_dir, filename), x_hat.cpu().numpy(), 16000)
File "/mnt/storage00/liujing04/anaconda3/envs/t12/lib/python3.6/site-packages/soundfile.py", line 315, in write
subtype, endian, format, closefd) as f:
File "/mnt/storage00/liujing04/anaconda3/envs/t12/lib/python3.6/site-packages/soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "/mnt/storage00/liujing04/anaconda3/envs/t12/lib/python3.6/site-packages/soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "/mnt/storage00/liujing04/anaconda3/envs/t12/lib/python3.6/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
*** RuntimeError: Error opening 'xxx.wav': Format not recognised.

(Pdb) x_hat.shape
torch.Size([2, 1067315])

Using soundfile I can only write the audio file with the shape of (1,xxxxx).

Could you try transposing the array given to soundfile.write, i.e. x_hat.cpu().numpy().T?

I’m guessing you’re running your own training and evaluation experiments, otherwise please note that this work, repo and especially the pretrained checkpoints are designed for single-channel audio only.