Rnnoise

This is the most comprehensive detail for RNNoise, a noise suppression library built upon a recurrent neural network. RNNoise delivers top-notch real-time noise reduction, ensuring a seamless audio communication experience on mobile devices by eliminating background noises and echoes. While the official implementation and advanced versions of RNNoise yield impressive results, they still face challenges in effectively suppressing echoes and various types of noise while preserving speech quality. We trained and fine-tuned RNNoise to address these challenges successfully.

Demo window

Get Wsl on window Results in files are removed from /mnt/wsl so you can do somewhere /mnt/c or d

cd /mnt/wsl

git clone https://github.com/xiph/rnnoise.git

sudo apt install autoconf
sudo apt-get install libtool

./autogen.sh
./configure
make

explorer.exe .

Replace below farend.wav with your noisy speech and rnnoise_farend.wav as output denoised

./examples/rnnoise_demo farend.wav rnnoise_farend.wav

The output is a 16-bit raw PCM file

Training

For training we need a clean speech and noise only file.

cd ..
cd src
./compile.sh
./denoise_training ../combined_clean_segments.raw ../combined_echo_segments.raw 500000 > training.f32

cd ..
cd training
python3 ./bin2hdf5.py ../src/training.f32 500000 87 training.h5
python3 ./rnn_train.py

We used our own dataset gathered from phones. For example dataset you can use Microsoft MS-SNSD to test [https://github.com/microsoft/MS-SNSD]

python3 ./dump_rnn.py weights.hdf5 ../src/rnn_data.c ../src/rnn_data.h orig

if you are facing issue of .h files being changed then revert that change then proceed furthur

cd ..
make

Results

Using pretrained model

Upon inference we found that Hogwash variant performs better in echo and noise suppression than Vanila, so we used pretrained model of hogwash for training on our dataset. You can use the any pre-trained models at [https://github.com/GregorR/rnnoise-models]

Results

Data preprocessing

We gathered dataset of speech and echo in respective folders, the data is those folders is combined to make single speech and one echo file. Next we divided them to 30 second segemnt as required by rnnoise for feature extraction and training. Finally we combined all files into PCM 16-bit, 48000 Hz raw file as clean.raw and echo.raw.

Within the pre-processing folder, the "clean" folder contains the speech data, while the "echo" folder have the echo data. To initiate the pre-processing, execute the "p1.py" script, which combne all the files and saves them in the output folder. Following this, execute the "p2.py" script, which splits the combined file into 30-second segments. Subsequently, execute the "p3.py" script to combine all the segmented files. Finally, Audacity to convert these files into the raw format.

Data Augmentation

To enhance our dataset, we performed data augmentation by collecting various impulse responses from different environments. We then applied convolution to these impulse responses with our echo and noise data.

Knowledge Distillation

Knowledge distillation is process of transferring knowledge from teacher model to smaller student model. This makes the student model to learn from the knowledge of higher model. The model architect is given in file rnn_train, which can be changed to train model from scratch or by loading pre-trained model. We trained three teachers models for noise, echo and reverb suppression. We used these models to train our student model Upon extensive testing we found that model performed good for noise and echo but it did not performed well for reverb suppression.

About

This is the most comprehensive guide for RNNoise, a noise suppression library built upon a recurrent neural network. RNNoise delivers top-notch real-time noise reduction, ensuring a seamless audio communication experience on mobile devices by eliminating background noises and echoes.

Languages

Language:C 94.0%Language:Python 6.0%Language:Shell 0.0%