Tricks to improve SEGAN performance. Eveything is re-implemented into Keras with Tensorflow backend.
Supporting document with evaluation results and other details can be found here.
Deepak Baby, iSEGAN: Improved Speech Enhancement Generative Adversarial Networks, Arxiv preprint, 2020.
- Install tensorflow and keras
- Install tqdm for profiling the training progress
- The experiments are conducted on a dataset from Valentini et. al., and are downloaded from here. The following script can be used to download the dataset. Requires sox for converting to 16kHz.
$ ./download_dataset.sh
- Prepare data for training and testing the various models. The folder path may be edited if you keep the database in a different folder. This script is to be executed only once and the all the models reads from the same location.
python prepare_data.py
- Running the models. The training and evaluation of the various segan models are implemented in
run_isegan.py
. which offers several cGAN configurations. Edit theopts
variable for choosing the cofiguration. The results will be automatically saved to different folders. The folder name is generated fromfiles_ops.py
and the foldername automatically includes different configuration options.
The options are:
- Different normalizations
- Instance Normalization
- Batch Normalization
- Batch Renormalization
- Group Normalization
- Spectral Normalization
- One Sided Label Smoothing: Encouranging the discriminator to estimate soft probabilities (0.8, 0.9, etc.) on the real samples.
- Trainable Auditory filter-bank layer: The first layer is initialized using a gammatone filterbank and use it as a trainable layer.
- Pre-emphasis Layer : Incorporating the pre-emphasis operation as a trainable layer.
- Evaluation on testset is also done together with training. Set
TEST_SEGAN = False
for disabling testing.
- This code loads all the data into memory for speeding up training. But if you dont have enough memory, it is possible to read the mini-batches from the disk using HDF5 read. In
run_<xxx>.py
change the above lines toclean_train_data = np.array(fclean['feat_data']) noisy_train_data = np.array(fnoisy['feat_data'])
But this can lead to a slow-down of about 20 times (on the test machine) as the mini-batches are to be read from the disk over several epochs.clean_train_data = fclean['feat_data'] noisy_train_data = fnoisy['feat_data']
[1] S. Pascual, A. Bonafonte, and J. Serra, SEGAN: speech enhancement generative adversarial network, in INTERSPEECH., ISCA, Aug 2017, pp. 3642–3646.
The keras implementation of cGAN is based on the following repos