GANs-for-Image-enhancement

DESCRIPTION

This project aims to train a GAN-based model for image enhancement (super-resolution, image restoration, contrast enhancement, etc.).

Two pre-trained generators with different loss functions (MSE and Feature Loss) are used in this project, and the results are evaluated for comparison.

DATASET

The dataset used for training was generated based on the Flickr-Image-Dataset, which is a dataset of 31.8k jpeg images of varying size, quality, and content.

Although this dataset is not designed for GAN training, its moderate number of images and rich content allow us to generate a dataset suitable for GAN training by "Crappify".

Crappify

Crappify is a process of generating low-quality images for training from the original image dataset by reducing resolution, adding dithering, randomly varying contrast, and adding random text, etc.

The dataset of generated images along with their high resolution counter-parts is available here^[1].

Notebook for Crappify : crappify-imgs.ipynb

[1] Note that since many operations in the Crappify process are random, the training set of images generated is different each time, and you are better to use your own generated training set

MODEL

Generator

The generator is a U-Net with pre-trained ResNet-34 as backbone. Here we take advantage of the super dynamic class unet_learner of fastai with weight normalization.

Discriminator

The discriminator is a gan_critic() also available in fastai library which has spectral normalization built into it, this is usually sufficient for most cases of DCGAN. It is left with its default hyperparameters.

Training

Pre-training

Generator

MSE

The first one uses Mean Squared Error(MSE) as loss. first, the UNet is trained by freezing the pre-trained ResNet-34 part. Then, all of the model is unfrozen and fine-tuned using smaller learning rate. The image size used in beginning is 128X128, Then the size is increased to 256X256 and trained again in a similar way.

Notebook for pre-training with MSE lose function: pretrain-gan-mse.ipynb

Feature Loss

The other model uses the same training process but the loss function is sum of Mean Absolute Error(MAE or l1_loss) and feature loss based on VGG-16 model, as in the famous paper on neural art transfer, https://arxiv.org/abs/1508.06576.

Notebook for pre-training with MSE lose function: pretrain-gan-feature-loss.ipynb

Discriminator

Images generated by generator are saved to disk and then these images in addition to original high resolution images are used to train it.

GAN training

The discriminator and generator are then put together as Gan and trained. It is trained by switching adaptively between discriminator and generator whenever discriminator loss drops below certain threshold. Learning rate used is the standard LR for GANs, i.e, 0.0002, after training for some time it is then reduced to 0.0001.

Notebook for model pre-trained with MSE: train-gan-mse.ipynb

Notebook for pre-trained with Feature-loss: train-gan-feature-loss.ipynb

Evaluate

Evaluate pre-trained models and GAN-trained models using MSE, PSNR and SSIM respectively. the MSE, PSNR and SSIM can be computed either on a sample set (faster) or on the entire dataset (more accurate).

Notebook for evaluation: evaluate.ipynb

RESULT

Training Environment

python 3.6.6

pytorch==1.0.1.post2 torchvision==0.2.2 fastai==1.0.51 pycuda==2018.1.1 cupy-cuda100==5.4.0 pandas==0.23.4 numpy==1.16.3

Training Time

All models were trained on Kaggle using a single Nvidia K80 GPU, with pre-training taking about 4 hours and GAN training taking about 9 hours.

Evaluation

Since the model trained with the MSE lose function is significantly less effective than the model using feature loss, only the evaluation of the feature loss model is shown here:

POSSIBLE IMPROVEMENTS

WGAN can be used instead of standard GAN loss(optimization of JS divergence).
Rather than having a hardcoded loss function based on features from pre-trained model, we can concatenate those feature to the input of discriminator.
Adding self attention to generator.
Bigger network(e.g: ResNet-50 as backbone), and bigger dataset.

REFERENCES

GANs-for-Image-enhancement

A Neural Algorithm of Artistic Style

isep-EoT-12 / GAN-Image-Enhancement

GANs-for-Image-enhancement

DESCRIPTION

DATASET

Crappify

MODEL

Generator

Discriminator

Training

Pre-training

Generator

MSE

Feature Loss

Discriminator

GAN training

Evaluate

RESULT

Training Environment

Training Time

Evaluation

POSSIBLE IMPROVEMENTS

REFERENCES

About

Languages