hwang9u / simple-speech-enhancement

Simple Convolutional Auto-Encoder for variable-length speech enhancement

Simple CAE(Convolutional Auto-Encoder) for Variable-Length Speech Denoising

Check notebook and Listen examples here

In this toyproject, I built simplest CAE(Convolutional Auto-Encoder) architecture for speech denoising.

Model

Encoder

A stack of 3 encoder blocks.
Since encdoer does not flatten the output, speech input with variable-length can be used.
Encoder block: Conv2d -> BatchNorm2d -> LeakyReLU

Decoder

A stack of 3 decoder blocks.
Decoder block: ConvTranspose2d -> LeakyReLU
Last decoder block contains only ConvTranspose2d.

+) Loss function

MaskedMSELoss: ignoring padding area in MSE loss computation.

Dataset

Clean speech dataset: "YesNo" dataset(torchaudio.datasets.yesno)
Noise signal: "noisesB" dataset (Libri Speech Noise Datase)
Noisy signal: clean speech signal + noise signal (with a specified SNR) <-- noisy.py

Examples(on Validation dataset)

About

Simple Convolutional Auto-Encoder for variable-length speech enhancement

MIT License

Languages

Language:Python 100.0%