gozsoy / conditional-vae

Tensorflow implementation of 'Conditional Variational Autoencoder' concept

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

conditional-vae

Tensorflow implementations of (Conditional) Variational Autoencoder concepts

Usage

cd {simple-vae|conditional-vae}/src
python main.py

VAE settings (β and latent dimension) can easily be modified inside main.py.

Simple VAE Experiments

In experiments below, latent space visualization is obtained by TSNE on encoder outputted means for each sample in the training set.

KL Loss vs. Reconstruction Loss

  • β = 0.0 (optimising only reconstruction loss): Latent space idea is not used because encoder can put each sample in separate places with punctual variations. Test image reconstruction quality is high even if they are not used in training, but generation ability is very low.
2d latent space reconstructed test set images Inewly generated images
  • β = 200.0 (optimising only kl loss): Without reconstruction pressure, all samples will have unit gaussian parameters, thus in the latent space no label(or similarity)-based clustering will be observed. Test image reconstruction quality, and generation ability are very low.
2d latent space reconstructed test set images Inewly generated images
  • β = 12.0 (optimising both losses): Both clustering nature of reconstruction loss and dense packing nature of kl loss observed.
2d latent space digit manifold when latent dimension = 2

Conditional VAE Experiments

Instead of having one normal distribution where each digit tries to find a place for itself, with the label conditioning, now each digit has its own Gaussian distribution. Thus, 2d latent space concept is no longer valid, but this is compansated by the option of producing desired digit. For the following experiments, even though I tried with various architectures, enough variation is not present. I observed the same problem for different implementations such as in this.

β = 1.0 β = 1e-11

Model

  • simple-vae: Both encoder and decoder consist of two fully connected hidden layers.
  • conditional-vae: Encoder consists of two convolutional layers. One-hot label vector concatenated on the flattened output of these. For decoder, after sampling, one hot vector concatenation applied. Decoder consists of 3 transposed convolution layers, where the final single feature map is decoded image.

Implementation Details

Various ways of VAE implementation is possible in TF, but I computed both losses after forward pass, which means model provides both encoder and decoder outputs.

Training is done on MNIST training set, using Adam optimizer with learning rate 1e-3 for maximum of 15 epochs. Depending on latent channel capacity, overfitting is possible, so early stopping via visual inspection is advised.

I experimented with different formulations of re-parametrization trick and found that z = μ + σ ⊙ ε is less stable than z = μ + log(1 + exp(ρ)) ⊙ ε, although both produce nice outcomes.

About

Tensorflow implementation of 'Conditional Variational Autoencoder' concept


Languages

Language:Python 100.0%