fhahaha/Percepnet-Keras

PercepNet (Still need to be tuned)

Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

https://www.researchgate.net/publication/343568932_A_Perceptually-Motivated_Approach_for_Low-Complexity_Real-Time_Enhancement_of_Fullband_Speech

Compared with https://github.com/jzi040941/PercepNet , this version is implemented using Keras.

----------------------------------------------------------
Due to github file size limit is 100M, rnn_data.c is compressed to rnn_data.c.tgz. 
This file need to be extracted before furthur compileing.
% cd src
% tar -xzvf rnn_data.c.tgz
% cd ..


To compile, just type:
% ./autogen.sh
% ./configure
% make

A simple command-line tool is
provided as an example. It operates on RAW 16-bit (machine endian) mono
PCM files sampled at 48 kHz. It can be used as:

./examples/rnnoise_demo <noisy speech> <output denoised>

The output is also a 16-bit raw PCM file.

------------------------------------------------------------

How to train:
(change to src subdirectory, assumed the clean and noise files's directory are in ~/DNS-Challenge/datasets/rnnoise3/)
cd ~/percepnet/src
./denoise_training ~/DNS-Challenge/datasets/rnnoise3/clean  ~/DNS-Challenge/datasets/rnnoise3/noise 80000000 training.f32

(change to training subdirectory)
cd ../training 
python bin2hdf5.py …/src/training.f32 80000000 138 training.h5
python rnn_train.py
python dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig
cp rnn_data.c ../src/   

(change to percepnet directory)
cd ~/percepnet/
make clean
make

(change to example subdirectory)
cd examples 
./rnnoise_demo test2.raw test2_denoised.raw

----------------------------------------------------------------
More:
The performance of this version needs furthur optimization and tuning.
In some cases, it is worse than Rnnoise.
Any comments on how to optimize/tune are welcome.

test_gr in src/ is used to test the classical processing, using computed g, r (not from deep learning model) directly to check 
whether g,r work not not.

The overall framework is based on https://github.com/xiph/rnnoise
And the speech signal processing codes are from  https://github.com/jzi040941/PercepNet
Compared with Rnnoise, the training data is standardized, and use float(not quantized) when conver to rnn_data.c.
And during training, clip_norm is set to 0.1, or loss will be NAN.

The training data are from:
https://github.com/microsoft/DNS-Challenge

Wavfiles processing codes(wav.h, wav.c) are from https://faculty.fiu.edu/~wgillam/wavfiles.html
fhahaha / Percepnet-Keras

About

Languages