Generative-Model-for-Video-Compression

This repository contains codes for trying out video compression with generative model. The goal is to try out video compression with generative model. I am planning to experiment it on KITTI Dataset.

Tons of codes are borrowed from dcgan by zsdonghao and fast style transfer by ShafeenTejani.

The initial idea is to reproduce the result of Generative Compression.

The framework used in their paper is shown in Fig.1. First, a dcgan is trained, then the generator of dcgan is used as the decoder of a vanilla autoencoder (with its parameters fixed). The input of the vanilla autoencoder is raw images, and then they are transformed into latent space by the encoder (default latent dimension is 100). The latent variables are then fed into the decoder (the generator of dcgan we have just trained), which gives us reconstructed images. The loss they use is a weighted sum of pixel loss and perceptual loss (4th convolutional layer of ImageNet pretrained AlexNet). For video compression, they skip some frames to reduce the amount of data and use linear interpolation on latent variables and decode the resulted latent variables to fill the missing frames.

I have experimented on the CelebA dataset (the size of image is 64*64). The following are some of the results I get.

1. Images generated by dcgan (using the default parameters)

2. Images and reconstructions with fixed generator and loss function is pixel loss only (first 8 rows are original images and the other 8 rows are reconstructions)

3. Images and reconstructions with non-fixed generator and loss function is weighted sum of pixel loss and perceptual loss (4th relu layer of ImageNet pretrained VGG)

Todo List:

Training DCGAN on CelebA
Training a Vanilla Autoencoder to reconstruct images
Experimenting with different parameter configurations
Training the model on KITTI Dataset
Using linear interpolation on latent variable to get missing frames as used in the aforementioned paper
Using LSTM on latent variables to predict the missing frames

cyrilli / Generative-Model-for-Video-Compression

Generative-Model-for-Video-Compression

1. Images generated by dcgan (using the default parameters)

2. Images and reconstructions with fixed generator and loss function is pixel loss only (first 8 rows are original images and the other 8 rows are reconstructions)

3. Images and reconstructions with non-fixed generator and loss function is weighted sum of pixel loss and perceptual loss (4th relu layer of ImageNet pretrained VGG)

About

Languages