This GAN was built as a part of internship hiring problem for rephrase.ai. The task was, given a set of degraded images, degraded using some unknown function, one has to build a GAN to fix the images and bring them as closer to the ground truth as possible.
To tackle this problem, I used a resnet like architecture in the generator and a VGG like architecture as the discriminiator. And trained the model on a compound loss function consisting of pixelwise loss, feature loss obtained using a pretrained VGG16 model, and finally the adversarial loss.
The generator is having a ResNet like architecture with skip connections of length 3. Each residual block consists of 3 convoltion blocks activated using LeakyReLU.
The discriminator network is having a VGG like architecture having Convolutions, Batch Norm activated with LeakyReLU
The loss function of the generator consists of a linear combination 3 losses.
- Pixelwise MSE with respect to the ground truth
- L1 Perceptual loss on features obtained from the last convolution of VGG16 model
- Adversarial component of the generator
Evaluation Metric Used: Peak signal to Noise Ratio (PSNR)
For the following set of hyperparameters, Mean PSNR of 36.166 was achieved by the end of 10,000 steps
Parameters | Value |
---|---|
Image Patch Size | 128 x 128 |
0.1 | |
0.0001 | |
0.0004 | |
0.7 |