This is implementation of "Looking to Listen at the Cocktail Party" by python3 and chainer. This deep learning technology can be applied to noise reduction, removal of background music, and speech separation.
Original paper is here (arxiv.org/abs/1804.03619). Note that this implementation is inspired by crystal-method (MIT).
We show demonstration of noise reduction using pretrained model.
- First, you need build docker container.
$ docker-compose build
-
Put the noisy audio file(s) to
./data/noise
. -
Run following command.
- GPU
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise
- CPU (comment out
_set_gpu()
innetwork/src/env.py
)
Intel CPU (Fast)
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise -ideep
Other CPU (Slow)
$ docker-compose run network python3 quick_start_audio_only.py /data/model/0f_1sclean_noise.npz /data/noise
- We can get clean audio in
./data/results
.
Please refer to the following section for additional information such as speech separation and audio-visual processing.
$ docker-compose run preprocess bash
$ docker-compose run dataset bash
$ docker-compose run network bash
The original paper has a large FC layer. However, there is not enough memory to put this network on the GPU. In this implementation, the size of the FC layer is reduced so that a network can be installed in a single GPU.
We use external libraries in preprocess/src/libs
.
- Facenet (MIT)