MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

In our paper, we proposed MP-SENet: a TF-domain monaural SE model with parallel magnitude and phase spectra denoising.
We provide our implementation as open source in this repository.

Abstract: This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public VoiceBank+DEMAND dataset and outperforms existing advanced SE methods.

A long-version MP-SENet is available on arxiv now. Audio samples can be found here.

This source code is only for the MP-SENet accepted by Interspeech 2023.

Pre-requisites

Python >= 3.6.
Clone this repository.
Install python requirements. Please refer requirements.txt.
Download and extract the VoiceBank+DEMAND dataset. Resample all wav files to 16kHz, and move the clean and noisy wavs to VoiceBank+DEMAND/wavs_clean and VoiceBank+DEMAND/wavs_noisy, respectively. You can also directly download the downsampled 16kHz dataset here.

Training

CUDA_VISIBLE_DEVICES=0,1 python train.py --config config.json

Checkpoints and copy of the configuration file are saved in the cp_mpsenet directory by default.
You can change the path by adding --checkpoint_path option.

Inference

python inference.py --checkpoint_file [generator checkpoint file path]

You can also use the pretrained best checkpoint file we provide in best_ckpt/g_best.
Generated wav files are saved in generated_files by default.
You can change the path by adding --output_dir option.

Model Structure

Comparison with other SE models

Acknowledgements

We referred to HiFiGAN, NSPP and CMGAN to implement this.

Citation

@inproceedings{lu2023mp,
  title={{MP-SENet}: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra},
  author={Lu, Ye-Xin and Ai, Yang and Ling, Zhen-Hua},
  booktitle={Proc. Interspeech},
  pages={3834--3838},
  year={2023}
}

yxlu-0102 / MP-SENet