pavas23 / Generative-AI

This project involves building generative architectures, including a few variants of GANs, Variational Autoencoders and Diffusion Models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image Generation and Image-to-Image Translation

Part A - Image Generation

In this we train generative models to generate images from random vectors sampled from the latent space, employing Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion Models on the MNIST dataset.

VAE

The VAE is structured only with Linear Layers, since the simplicity of the dataset meant we weren’t required to make the model more complex with ideas like Conv and Pool Layers.

Our architecture involves the model learning the mean and variance of a latent variable using an Encoder, and re-constructing the images from that using a Decoder. We used the relu and sigmoid activation functions in the layers.

GAN

The GAN is trained to generate realistic MNIST digit images. The generator learns to generate images from random noise, while the discriminator learns to distinguish between real and fake images. The training loop alternates between training the discriminator and the generator.

The generator aims to minimize the discriminator's ability to distinguish fake images from real ones. The discriminator aims to maximize its ability to distinguish between real and fake images.

We used the relu in the layers of the generator and relu and sigmoid activation functions in the layers of the discriminator. We also chose to use the Adam Optimizer, since it generally requires a lower learning rate, and that it converges faster.

The following are the images of the reconstructions of random points for latent sizes 2, 4 and 8.

fake_2_50 fake_4_50 fake_8_50

The following image also depicts how the training loss reduces over epochs for each latent size for the generator.

Generator

Diffusion Models

We implement a diffusion model for the MNIST dataset. Diffusion models are used for image generation and manipulation tasks. This model applies forward and reverse diffusion processes to images, allowing for the generation of high-quality samples.

SiLU (Sigmoid-weighted Linear Unit) is used as the activation function.

We also chose to use the Adam Optimizer, since it generally requires a lower learning rate, and that it converges faster.

These are the results obtained from the diffusion model in multiple steps.

steps_00000469 steps_00006097 steps_00015946

Part B - Image-to-Image Translation

We focus on image-to-image translation, aiming to convert source images into target images while altering specific visual properties while preserving others. This is accomplished by employing two variants of Generative Adversarial Networks (GANs) on the CelebA dataset.

CycleGAN

We implement a CycleGAN (Cycle-Consistent Generative Adversarial Network) for image-to-image translation. CycleGANs are used to learn mappings between two different domains without requiring paired data.

The activation functions used are

  • Leaky ReLU: Used in both the generator and discriminator.
  • ReLU: Used in the generator for non-linearity.
  • Sigmoid: Used in the discriminator to output probabilities.

We also chose to use the Adam Optimizer, since it generally requires a lower learning rate, and that it converges faster.

Adversarial Loss (GAN Loss)

Binary Cross-Entropy (BCE) loss is used to train the generators and discriminators by minimizing the difference between real and fake predictions.

Cycle Consistency Loss

Mean Absolute Error (L1 loss) is used to enforce cycle consistency between the original and reconstructed images, ensuring that the image after translation and back-translation is close to the original image.

Men without glasses to men with glasses and vice versa

1 2 3 4 5

Men with glasses to women with glasses and vice versa

1 2 3 4 5

Deep Convolutional GAN (DCGAN)

We implement three neural network models: a Generative Adversarial Network (GAN) Generator, a GAN Discriminator, and a CNN Encoder.

The activation functions used are

  • Leaky ReLU: Used in both the discriminator and encoder models
  • ReLU: Used in the generator for non-linearity.
  • Sigmoid: Used in the final layer of discriminator to output probabilities

The Loss function used for training is Binary Cross-Entropy Loss.

Vector Arithmetic Result

Men without glasses + People with glasses - People without glasses

men_people

Men with glasses - Men without glasses + Women without glasses

men_women

Smiling Men + People with Hat - People with Hat + People with Mustache - People without Mustache

men_hat

About

This project involves building generative architectures, including a few variants of GANs, Variational Autoencoders and Diffusion Models.


Languages

Language:Python 100.0%