gboduljak / vae

Implementations of various VAEs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VAEs

A repository of different implementations of variational autoencoders (VAEs) in PyTorch.

Architecture

The architecture is inspired by U-Net, an encoder-decoder architecture. The encoder path consists of ConvBlock and Downsample modules that progressively reduce the spatial dimensions while increasing feature channels. At the bottleneck, ResidualBlock modules refine the encoded features. The decoder path mirrors the encoder, using ConvBlock and Upsample modules to restore the original spatial dimensions. The network begins with an input projection layer and ends with an output projection layer, ensuring the output matches the input's spatial dimensions.

Objective

Taken directly from Autoencoding Variational Bayes.

Results

MNIST

MNIST_latent
Figure 1: Latent Space. Each MNIST image is compressed to a two-dimensional latent vector. The plot shows this latent space stratified by the label (digit), computed on the test set. Dashed lines are the contours of 2D unit Gaussian.
MNIST_reconstructions
Figure 2: Reconstructions. The grid consists of input images and their reconstructions generated by the model. For each pair, the left image is real input, while the right image is the corresponding reconstruction generated by the model.
MNIST_interpolations
Figure 3: Interpolations. The leftmost and rightmost images are interpolation endpoints (inputs), while the middle images are interpolations of these two, generated by the model.
MNIST_samples
Figure 4: Samples.

CelebA

CelebA_reconstructions
Figure 2: Reconstructions. The grid consists of input images and their reconstructions generated by the model. For each pair, the left image is real input, while the right image is the corresponding reconstruction generated by the model.
CelebA_interpolations
Figure 3: Interpolations. The leftmost and rightmost images are interpolation endpoints (inputs), while the middle images are interpolations of these two, generated by the model.
CelebA_samples
Figure 4: Samples.
CelebA_latent_space
Figure 5: Latent space distribution. A distribution histogram is computed for each latent space dimension on the test set. The histogram is colored blue, while the probability density function of a unit Gaussian is colored orange.

Reproduction

To reproduce these results, download the appropriate checkpoint from HuggingFace dir and run the notebook.

About

Implementations of various VAEs.

License:MIT License


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%