DNN-based Speech Enhancement in the frequency domain

You can do DNN-based speech enhancement(SE) in the frequency domain using various method through this repository.
First, you have to make noisy data by mixing clean speech and noise. The dataset is used for deep learning training.
And, you can adjust the type of the network and configuration in various ways, as shown below.
The results of the network can be evaluated through various objective metrics (PESQ, STOI, CSIG, CBAK, COVL).

You can change

Networks
Learning methods
Loss functions

Requirements

This repository is tested on Ubuntu 20.04, and

Python 3.7
Cuda 11.1
CuDNN 8.0.5
Pytorch 1.9.0

Getting Started

Install the necessary libraries

Make a dataset for train and validation

# The shape of the dataset
[data_num, 2 (inputs and targets), sampling_frequency * data_length]   

# For example, if you want to use 1,000 3-second data sets with a sampling frequency of 16k, the shape is,   
[1000, 2, 48000]

Set dataloader.py
```
self.input_path = "DATASET_FILE_PATH"
```

Set config.py

# If you need to adjust any settings, simply change this file.   
# When you run this project for the first time, you need to set the path where the model and logs will be saved.

Run train_interface.py

Tutorials

'SE_tutorials.ipynb' was made for tutorial.
You can simply train the CRN with the colab file without any preparation .

Networks

You can find a list that you can adjust in various ways at config.py, and they are:

Real network
- convolutional recurrent network (CRN)
  it is a real version of DCCRN
- FullSubNet [1]
Complex network
- deep complex convolutional recurrent network (DCCRN) [2]

Learning Methods

T-F masking
Spectral mapping

Loss Functions

MSE
SDR
SI-SNR
SI-SDR

and you can join the loss functions with perceptual loss.

LMS
PMSQE

Tensorboard

As shown below, you can check whether the network is being trained well in real time through 'write_on_tensorboard.py'.

loss
pesq, stoi
spectrogram

Reference

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
[arXiv] [code]
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie
[arXiv] [code]
Other tools
https://github.com/usimarit/semetrics
https://ecs.utdallas.edu/loizou/speech/software.htm

About

DNN-based SE in the frequency domain using Pytorch. You can test some state-of-the-art networks using T-F masking or spectral mapping method.

MIT License

Languages

Language:Python 62.1%Language:Jupyter Notebook 29.1%Language:MATLAB 8.8%