In this we train generative models to generate images from random vectors sampled from the latent space, employing Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion Models on the MNIST dataset.
The VAE is structured only with Linear Layers
, since the simplicity of the dataset meant we weren’t required to make the model more complex with ideas like Conv and Pool Layers.
Our architecture involves the model learning the mean
and variance
of a latent variable using an Encoder, and re-constructing the images from that using a Decoder. We used the relu
and sigmoid
activation functions in the layers.
The GAN is trained to generate realistic MNIST
digit images. The generator learns to generate images from random noise, while the discriminator learns to distinguish between real and fake
images. The training loop alternates between training the discriminator and the generator
.
The generator aims to minimize the discriminator's ability to distinguish fake images from real ones. The discriminator aims to maximize its ability to distinguish between real and fake images.
We used the relu
in the layers of the generator and relu and sigmoid
activation functions in the layers of the discriminator. We also chose to use the Adam Optimizer
, since it generally requires a lower learning rate, and that it converges faster.
The following are the images of the reconstructions of random points for latent sizes 2, 4 and 8.
The following image also depicts how the training loss reduces over epochs for each latent size for the generator.
We implement a diffusion model for the MNIST dataset. Diffusion models are used for image generation and manipulation tasks. This model applies forward and reverse diffusion processes to images, allowing for the generation of high-quality samples.
SiLU (Sigmoid-weighted Linear Unit)
is used as the activation function.
We also chose to use the Adam Optimizer, since it generally requires a lower learning rate, and that it converges faster.
These are the results obtained from the diffusion model in multiple steps.
We focus on image-to-image translation, aiming to convert source images into target images while altering specific visual properties while preserving others. This is accomplished by employing two variants of Generative Adversarial Networks (GANs) on the CelebA dataset.
We implement a CycleGAN (Cycle-Consistent Generative Adversarial Network) for image-to-image translation. CycleGANs are used to learn mappings between two different domains without requiring paired data.
The activation functions used are
- Leaky ReLU: Used in both the generator and discriminator.
- ReLU: Used in the generator for non-linearity.
- Sigmoid: Used in the discriminator to output probabilities.
We also chose to use the Adam Optimizer
, since it generally requires a lower learning rate, and that it converges faster.
Binary Cross-Entropy (BCE) loss is used to train the generators and discriminators by minimizing the difference between real and fake predictions.
Mean Absolute Error (L1 loss) is used to enforce cycle consistency between the original and reconstructed images, ensuring that the image after translation and back-translation is close to the original image.
We implement three neural network models: a Generative Adversarial Network (GAN) Generator, a GAN Discriminator, and a CNN Encoder.
The activation functions used are
- Leaky ReLU: Used in both the discriminator and encoder models
- ReLU: Used in the generator for non-linearity.
- Sigmoid: Used in the final layer of discriminator to output probabilities
The Loss function used for training is Binary Cross-Entropy Loss
.