Ultra-low bitrate video compression using deep animation models

This repository contains the source code for the papers ULTRA-LOW BITRATE VIDEO CONFERENCING USING DEEP IMAGE ANIMATION, A HYBRID DEEP ANIMATION CODEC FOR LOW-BITRATE VIDEO CONFERENCING and PREDICTIVE CODING FOR ANIMATION-BASED VIDEO COMPRESSION

Installation

We support python3. To install the dependencies run:

pip install -r requirements.txt

Assets

YAML Config

Describes the configuration settings for for training and testing the models. See config/dac.yaml, config/hdac.yaml,config/rdac.yaml. Use --mode test at inference time with the same config file after changing the eval_params appropriately.

Datasets

**VoxCeleb**. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

**Creating your own videos**. 
The input videos should be cropped to target the speaker's face at a resolution of 256x256 (Updates are underway to add higher resolution). 

**Pre-processed videos (256x256 px)**
We provide preprocessed videos at the following link: [google-drive](https://drive.google.com/drive/folders/1g0U1ZCTszm3yrmIewg7FahXsxyMBfxKj?usp=sharing)

Download put the videos in ```datasets/train``` and ```datasets/inference``` folders.

Pre-trained checkpoint

Checkpoints can be found under following link: google-drive. Download and place in the checkpoints/ directory.

Metrics

We include a metrics module combining the suggestions from JPEG-AI with popular quantiative metrics used in computer vision and beyond. Supported metrics: 'psnr', 'psnr-hvs','fsim','iw_ssim','ms_ssim','vif','nlpd', 'vmaf','lpips'

Training

Set the config/[MODEL_NAME].yaml parameters appropriately or use default (to reproduce our results) and run bash script_training.sh [MODEL_NAME]. The default setup uses a single GPU (NVIDIA-A100). However, training DAC, HDAC and RDAC can be trained on multiple GPUs by using distributed dataparallel and setting --device_ids parameter as desired.

Testing

NOTE: baselines.yaml is used for HEVC, VVC and VVENC. Download the HEVC, VVC(VTM-12) from google-drive and put them conventional_codecs/ folder.

Set the eval_params on the config/[MODEL_NAME].yaml file and run bash script_test.sh [MODEL_NAME].

Attributions

This code base contains source code from the following works:

First Order Motion Model for Image Animation for the base architecture of deep image animation with unsupervised keypoints.
Compressai for Learned image compression.
JPEG-AI for evaluation metrics.

Chen8023 / ITCS-6166-Group-13