beresandras / gan-flavours-keras

Implementation of GAN losses, kernel inception distance and adaptive discriminator augmentation using Keras.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GAN Flavours: Comparison of GAN Losses and Architectures with Keras

This repository contains:

  1. A simple DCGAN model with a flexible configurable architecture along with the following avaliable losses (in losses.py):
  2. An implementation of Adaptive Discriminator Augmentation for training GANs with limited amounts of data. It was used for ablations and hyperparameter optimization for the corresponding Keras code example, but was turned off for the experiments below.
  3. An implementation of Kernel Inception Distance (KID), which is a GAN performance metric with a simple unbiased estimator, that is more suitable for limited amounts of images, and is also computationally cheaper to measure compared to the Frechet Inception Distance (FID). Implementation details include (all being easy to tweak):
    • The Inceptionv3 network's pretrained weights are loaded from Keras applications.
    • For computational efficiency, the images are evaluated at the minimal possible resolution (75x75 instead of 299x299), therefore the exact values might not be comparable with other implementations.
    • For computational efficiency, the metric is only measured on the validation splits of the datasets.

List of GAN training tips and tricks based on experience with this repository.

Try it out in a Colab Notebook (good results take around 2 hours of training): Open In Colab

Cherry-picked 256x256 flowers (augmentation + residual connections + no transposed convolutions): hq flowers generated images

Caltech Birds 2011 (CUB-200)

  • 6000 training images
  • 400 epoch training
  • 64x64 resolution, cropped on bounding boxes

KID results (the lower the better):

Loss / Architecture Vanilla DCGAN Spectral Norm. Residual Residual + Spectral Norm.
Non-saturating GAN 0.087 0.184 0.479 0.533
Least Squares GAN 0.114 0.153 0.312 0.361
Hinge GAN 0.123 0.238 0.165 0.304
Wasserstein GAN * 1.066 * 0.679

Images generated by a vanilla DCGAN + non-saturating loss: birds generated images

Oxford Flowers 102

  • 6000 training images (70% of every split)
  • 500 epoch training
  • 64x64 resolution, center cropped

KID results (the lower the better):

Loss / Architecture Vanilla DCGAN Spectral Norm. Residual Residual + Spectral Norm.
Non-saturating GAN 0.080 0.083 0.094 0.139
Least Squares GAN 0.104 0.092 0.110 0.131
Hinge GAN 0.090 0.087 0.101 0.099
Wasserstein GAN * 0.165 * 0.107

Images generated by a vanilla DCGAN + non-saturating loss: flowers generated images

CelebFaces Attributes (CelebA)

  • 160.000 training images
  • 25 epoch training
  • 64x64 resolution, center cropped

KID results (the lower the better):

Loss / Architecture Vanilla DCGAN Spectral Norm. Residual Residual + Spectral Norm.
Non-saturating GAN 0.015 0.044 0.016 0.036
Least Squares GAN 0.017 0.036 0.014 0.036
Hinge GAN 0.015 0.041 0.020 0.036
Wasserstein GAN * 0.058 * 0.061

Images generated by a vanilla DCGAN + non-saturating loss: celeba generated images

CIFAR-10

  • 50.000 training images
  • 100 epoch training
  • 32x32 resolution

KID results (the lower the better):

Loss / Architecture Vanilla DCGAN Spectral Norm. Residual Residual + Spectral Norm.
Non-saturating GAN 0.081 0.117 0.088 0.127
Least Squares GAN 0.081 0.107 0.089 0.109
Hinge GAN 0.084 0.109 0.092 0.121
Wasserstein GAN * 0.213 * 0.213

Images generated by a vanilla DCGAN + non-saturating loss: cifar10 generated images

*Based on theory, Wasserstein GANs require Lipschitz-constrained discriminators, and therefore they are only evaluated with architectures using spectral normalization in their discriminators.

Findings

After comparing GAN losses across architectures and datasets, my findings are in line with the findings of the Are GANs Created Equal? study: no loss outperforms the non-saturating loss consistently. The training dynamics show similar stability, and the generation quality is also similar across the losses. Wasserstein GANs seem to be underperforming in comparison to the others, though the hyperparameters used might be suboptimal, as they follow mostly the DCGAN paper. I recommend using the non-saturating loss as a default.

Image augmentations

In this implementation, RandomFlip, RandomTranslation, RandomRotation and RandomZoom are used for image augmentation when applying Adaptive Discriminator Augmentation, because in the paper these "pixel blitting" and geometric image augmentations are shown to be the most useful (see figures 4a and 4b). One can add other augmentations as well, using the custom Keras augmentations layers of this repository for example, which implements color jitter, additive gaussian noise and random resized crop among others in a differentiable and GPU-compatible manner.

For a similar implementation of denoising diffusion models, check out this repository.

About

Implementation of GAN losses, kernel inception distance and adaptive discriminator augmentation using Keras.

License:MIT License


Languages

Language:Python 81.7%Language:Jupyter Notebook 18.3%