In this toyproject, I built simplest CAE(Convolutional Auto-Encoder) architecture for speech denoising.
- A stack of 3 encoder blocks.
- Since encdoer does not flatten the output, speech input with variable-length can be used.
- Encoder block:
Conv2d -> BatchNorm2d -> LeakyReLU
- A stack of 3 decoder blocks.
- Decoder block:
ConvTranspose2d -> LeakyReLU
- Last decoder block contains only
ConvTranspose2d
.
MaskedMSELoss
: ignoring padding area in MSE loss computation.
-
Clean speech dataset: "YesNo" dataset(
torchaudio.datasets.yesno
) -
Noise signal: "noisesB" dataset (Libri Speech Noise Datase)
-
Noisy signal: clean speech signal + noise signal (with a specified SNR) <--
noisy.py