cyrilli / Generative-Model-for-Video-Compression

This repository contains codes for trying out video compression with generative model.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generative-Model-for-Video-Compression

This repository contains codes for trying out video compression with generative model. The goal is to try out video compression with generative model. I am planning to experiment it on KITTI Dataset.

Tons of codes are borrowed from dcgan by zsdonghao and fast style transfer by ShafeenTejani.

The initial idea is to reproduce the result of Generative Compression. Generative Compression

The framework used in their paper is shown in Fig.1. First, a dcgan is trained, then the generator of dcgan is used as the decoder of a vanilla autoencoder (with its parameters fixed). The input of the vanilla autoencoder is raw images, and then they are transformed into latent space by the encoder (default latent dimension is 100). The latent variables are then fed into the decoder (the generator of dcgan we have just trained), which gives us reconstructed images. The loss they use is a weighted sum of pixel loss and perceptual loss (4th convolutional layer of ImageNet pretrained AlexNet). For video compression, they skip some frames to reduce the amount of data and use linear interpolation on latent variables and decode the resulted latent variables to fill the missing frames.

I have experimented on the CelebA dataset (the size of image is 64*64). The following are some of the results I get.

1. Images generated by dcgan (using the default parameters)

Images generated by DCGAN

2. Images and reconstructions with fixed generator and loss function is pixel loss only (first 8 rows are original images and the other 8 rows are reconstructions)

Images and reconstructions with fixed generator and loss function is pixel loss

3. Images and reconstructions with non-fixed generator and loss function is weighted sum of pixel loss and perceptual loss (4th relu layer of ImageNet pretrained VGG)

Images and reconstructions with non-fixed generator and loss function is weighted sum of pixel loss and perceptual loss

Todo List:

  • Training DCGAN on CelebA
  • Training a Vanilla Autoencoder to reconstruct images
  • Experimenting with different parameter configurations
  • Training the model on KITTI Dataset
  • Using linear interpolation on latent variable to get missing frames as used in the aforementioned paper
  • Using LSTM on latent variables to predict the missing frames

About

This repository contains codes for trying out video compression with generative model.


Languages

Language:Python 100.0%