This is the most comprehensive detail for RNNoise, a noise suppression library built upon a recurrent neural network. RNNoise delivers top-notch real-time noise reduction, ensuring a seamless audio communication experience on mobile devices by eliminating background noises and echoes. While the official implementation and advanced versions of RNNoise yield impressive results, they still face challenges in effectively suppressing echoes and various types of noise while preserving speech quality. We trained and fine-tuned RNNoise to address these challenges successfully.
Get Wsl on window Results in files are removed from /mnt/wsl so you can do somewhere /mnt/c or d
cd /mnt/wsl
git clone https://github.com/xiph/rnnoise.git
sudo apt install autoconf
sudo apt-get install libtool
./autogen.sh
./configure
make
explorer.exe .
Replace below farend.wav with your noisy speech and rnnoise_farend.wav as output denoised
./examples/rnnoise_demo farend.wav rnnoise_farend.wav
The output is a 16-bit raw PCM file
For training we need a clean speech and noise only file.
cd ..
cd src
./compile.sh
./denoise_training ../combined_clean_segments.raw ../combined_echo_segments.raw 500000 > training.f32
cd ..
cd training
python3 ./bin2hdf5.py ../src/training.f32 500000 87 training.h5
python3 ./rnn_train.py
We used our own dataset gathered from phones. For example dataset you can use Microsoft MS-SNSD to test [https://github.com/microsoft/MS-SNSD]
python3 ./dump_rnn.py weights.hdf5 ../src/rnn_data.c ../src/rnn_data.h orig
if you are facing issue of .h files being changed then revert that change then proceed furthur
cd ..
make
Results
Upon inference we found that Hogwash variant performs better in echo and noise suppression than Vanila, so we used pretrained model of hogwash for training on our dataset. You can use the any pre-trained models at [https://github.com/GregorR/rnnoise-models]
Results
We gathered dataset of speech and echo in respective folders, the data is those folders is combined to make single speech and one echo file. Next we divided them to 30 second segemnt as required by rnnoise for feature extraction and training. Finally we combined all files into PCM 16-bit, 48000 Hz raw file as clean.raw and echo.raw.
Within the pre-processing folder, the "clean" folder contains the speech data, while the "echo" folder have the echo data. To initiate the pre-processing, execute the "p1.py" script, which combne all the files and saves them in the output folder. Following this, execute the "p2.py" script, which splits the combined file into 30-second segments. Subsequently, execute the "p3.py" script to combine all the segmented files. Finally, Audacity to convert these files into the raw format.
To enhance our dataset, we performed data augmentation by collecting various impulse responses from different environments. We then applied convolution to these impulse responses with our echo and noise data.
Knowledge distillation is process of transferring knowledge from teacher model to smaller student model. This makes the student model to learn from the knowledge of higher model. The model architect is given in file rnn_train, which can be changed to train model from scratch or by loading pre-trained model. We trained three teachers models for noise, echo and reverb suppression. We used these models to train our student model Upon extensive testing we found that model performed good for noise and echo but it did not performed well for reverb suppression.