GAN Flavours: Comparison of GAN Losses and Architectures with Keras

This repository contains:

A simple DCGAN model with a flexible configurable architecture along with the following avaliable losses (in losses.py):
An implementation of Adaptive Discriminator Augmentation for training GANs with limited amounts of data. It was used for ablations and hyperparameter optimization for the corresponding Keras code example, but was turned off for the experiments below.
An implementation of Kernel Inception Distance (KID), which is a GAN performance metric with a simple unbiased estimator, that is more suitable for limited amounts of images, and is also computationally cheaper to measure compared to the Frechet Inception Distance (FID). Implementation details include (all being easy to tweak):
- The Inceptionv3 network's pretrained weights are loaded from Keras applications.
- For computational efficiency, the images are evaluated at the minimal possible resolution (75x75 instead of 299x299), therefore the exact values might not be comparable with other implementations.
- For computational efficiency, the metric is only measured on the validation splits of the datasets.

List of GAN training tips and tricks based on experience with this repository.

Try it out in a Colab Notebook (good results take around 2 hours of training):

Cherry-picked 256x256 flowers (augmentation + residual connections + no transposed convolutions):

Caltech Birds 2011 (CUB-200)

6000 training images
400 epoch training
64x64 resolution, cropped on bounding boxes

KID results (the lower the better):

Loss / Architecture	Vanilla DCGAN	Spectral Norm.	Residual	Residual + Spectral Norm.
Non-saturating GAN	0.087	0.184	0.479	0.533
Least Squares GAN	0.114	0.153	0.312	0.361
Hinge GAN	0.123	0.238	0.165	0.304
Wasserstein GAN	*	1.066	*	0.679

Images generated by a vanilla DCGAN + non-saturating loss:

Oxford Flowers 102

6000 training images (70% of every split)
500 epoch training
64x64 resolution, center cropped

KID results (the lower the better):

Loss / Architecture	Vanilla DCGAN	Spectral Norm.	Residual	Residual + Spectral Norm.
Non-saturating GAN	0.080	0.083	0.094	0.139
Least Squares GAN	0.104	0.092	0.110	0.131
Hinge GAN	0.090	0.087	0.101	0.099
Wasserstein GAN	*	0.165	*	0.107

Images generated by a vanilla DCGAN + non-saturating loss:

CelebFaces Attributes (CelebA)

160.000 training images
25 epoch training
64x64 resolution, center cropped

KID results (the lower the better):

Loss / Architecture	Vanilla DCGAN	Spectral Norm.	Residual	Residual + Spectral Norm.
Non-saturating GAN	0.015	0.044	0.016	0.036
Least Squares GAN	0.017	0.036	0.014	0.036
Hinge GAN	0.015	0.041	0.020	0.036
Wasserstein GAN	*	0.058	*	0.061

Images generated by a vanilla DCGAN + non-saturating loss:

CIFAR-10

50.000 training images
100 epoch training
32x32 resolution

KID results (the lower the better):

Loss / Architecture	Vanilla DCGAN	Spectral Norm.	Residual	Residual + Spectral Norm.
Non-saturating GAN	0.081	0.117	0.088	0.127
Least Squares GAN	0.081	0.107	0.089	0.109
Hinge GAN	0.084	0.109	0.092	0.121
Wasserstein GAN	*	0.213	*	0.213

Images generated by a vanilla DCGAN + non-saturating loss:

*Based on theory, Wasserstein GANs require Lipschitz-constrained discriminators, and therefore they are only evaluated with architectures using spectral normalization in their discriminators.

Findings

After comparing GAN losses across architectures and datasets, my findings are in line with the findings of the Are GANs Created Equal? study: no loss outperforms the non-saturating loss consistently. The training dynamics show similar stability, and the generation quality is also similar across the losses. Wasserstein GANs seem to be underperforming in comparison to the others, though the hyperparameters used might be suboptimal, as they follow mostly the DCGAN paper. I recommend using the non-saturating loss as a default.

Image augmentations

In this implementation, RandomFlip, RandomTranslation, RandomRotation and RandomZoom are used for image augmentation when applying Adaptive Discriminator Augmentation, because in the paper these "pixel blitting" and geometric image augmentations are shown to be the most useful (see figures 4a and 4b). One can add other augmentations as well, using the custom Keras augmentations layers of this repository for example, which implements color jitter, additive gaussian noise and random resized crop among others in a differentiable and GPU-compatible manner.

For a similar implementation of denoising diffusion models, check out this repository.

About

Implementation of GAN losses, kernel inception distance and adaptive discriminator augmentation using Keras.

MIT License

Languages

Language:Python 81.7%Language:Jupyter Notebook 18.3%