wwhio / TecoGAN-PyTorch

A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TecoGAN-PyTorch

Introduction

This is a PyTorch reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution (VSR). Please refer to the official TensorFlow implementation TecoGAN-TensorFlow for more information.

Updates

  • 11/2021: Supported 2x SR.
  • 10/2021: Supported model training/testing on the REDS dataset.
  • 07/2021: Upgraded codebase to support multi-GPU training & testing.

Features

  • Better Performance: This repo provides model with smaller size yet better performance than the official repo. See our Benchmark.
  • Multiple Degradations: This repo supports two types of degradation, BI (Matlab's imresize with the option bicubic) & BD (Gaussian Blurring + Down-sampling).
  • Unified Framework: This repo provides a unified framework for distortion-based and perception-based VSR methods.

Contents

  1. Dependencies
  2. Testing
  3. Training
  4. Benchmark
  5. License & Citation
  6. Acknowledgements

Dependencies

  • Ubuntu >= 16.04
  • NVIDIA GPU + CUDA
  • Python >= 3.7
  • PyTorch >= 1.4.0
  • Python packages: numpy, matplotlib, opencv-python, pyyaml, lmdb
  • (Optional) Matlab >= R2016b

Testing

Note: We apply different models according to the degradation type. The following steps are for 4xSR under BD degradation. You can switch to 2xSR or BI degradation by replacing all 4x to 2x and BD to BI below.

  1. Download the official Vid4 and ToS3 datasets. In BD mode, only ground-truth data is needed.
bash ./scripts/download/download_datasets.sh BD 

You can manually download these datasets from Google Drive, and unzip them under ./data.

The dataset structure is shown as below.

data
  ├─ Vid4
    ├─ GT                # Ground-Truth (GT) sequences
      └─ calendar
        └─ ***.png
    ├─ Gaussian4xLR      # Low Resolution (LR) sequences in BD degradation
      └─ calendar
        └─ ***.png
    └─ Bicubic4xLR       # Low Resolution (LR) sequences in BI degradation
      └─ calendar
        └─ ***.png
  └─ ToS3
    ├─ GT
    ├─ Gaussian4xLR
    └─ Bicubic4xLR
  1. Download our pre-trained TecoGAN model.
bash ./scripts/download/download_models.sh BD TecoGAN

You can download the model from [BD-4x-Vimeo][BI-4x-Vimeo][BD-4x-REDS][BD-2x-REDS], and put it under ./pretrained_models.

  1. Run TecoGAN for 4x SR. The results will be saved in ./results. You can specify which model and how many gpus to be used in test.sh.
bash ./test.sh BD TecoGAN/TecoGAN_VimeoTecoGAN_4xSR_2GPU
  1. Evaluate the upsampled results using the official metrics. These codes are borrowed from TecoGAN-TensorFlow, with minor modifications to adapt to the BI degradation.
python ./codes/official_metrics/evaluate.py -m TecoGAN_4x_BD_Vimeo_iter500K
  1. Profile model (FLOPs, parameters and speed). You can modify the last argument to specify the size of the LR video.
bash ./profile.sh BD TecoGAN/TecoGAN_VimeoTecoGAN_4xSR_2GPU 3x134x320

Training

Note: Due to the inaccessibility of the VimeoTecoGAN dataset, we recommend using other public datasets, e.g., REDS, for model training. To use REDS as the training dataset, just download it from here and replace the following VimeoTecoGAN to REDS.

  1. Download the official training dataset according to the instructions in TecoGAN-TensorFlow, rename to VimeoTecoGAN/Raw, and place under ./data.

  2. Generate LMDB for GT data to accelerate IO. The LR counterpart will then be generated on the fly during training.

python ./scripts/create_lmdb.py --dataset VimeoTecoGAN --raw_dir ./data/VimeoTecoGAN/Raw --lmdb_dir ./data/VimeoTecoGAN/GT.lmdb

The following shows the dataset structure after finishing the above two steps.

data
  ├─ VimeoTecoGAN
    ├─ Raw                 # Raw dataset
      ├─ scene_2000
        └─ ***.png
      ├─ scene_2001
        └─ ***.png
      └─ ...
    └─ GT.lmdb             # LMDB dataset
      ├─ data.mdb
      ├─ lock.mdb
      └─ meta_info.pkl     # each key has format: [vid]_[total_frame]x[h]x[w]_[i-th_frame]
  1. (Optional, this step is only required for BI degradation) Manually generate the LR sequences with the Matlab's imresize function, and then create LMDB for them.
# Generate the raw LR video sequences. Results will be saved at ./data/VimeoTecoGAN/Bicubic4xLR
matlab -nodesktop -nosplash -r "cd ./scripts; generate_lr_bi"

# Create LMDB for the LR video sequences
python ./scripts/create_lmdb.py --dataset VimeoTecoGAN --raw_dir ./data/VimeoTecoGAN/Bicubic4xLR --lmdb_dir ./data/VimeoTecoGAN/Bicubic4xLR.lmdb
  1. Train a FRVSR model first, which can provide a better initialization for the subsequent TecoGAN training. FRVSR has the same generator as TecoGAN, but without perceptual training (GAN and perceptual losses).
bash ./train.sh BD FRVSR/FRVSR_VimeoTecoGAN_4xSR_2GPU

You can download and use our pre-trained FRVSR models instead of training from scratch. [BD-4x-Vimeo] [BI-4x-Vimeo] [BD-4x-REDS][BD-2x-REDS]

When the training is complete, set the generator's load_path in experiments_BD/TecoGAN/TecoGAN_VimeoTecoGAN_4xSR_2GPU/train.yml to the latest checkpoint weight of the FRVSR model.

  1. Train a TecoGAN model. You can specify which gpu to be used in train.sh. By default, the training is conducted in the background and the output info will be logged in ./experiments_BD/TecoGAN/TecoGAN_VimeoTecoGAN/train/train.log.
bash ./train.sh BD TecoGAN/TecoGAN_VimeoTecoGAN_4xSR_2GPU
  1. Run the following script to monitor the training process and visualize the validation performance.
python ./scripts/monitor_training.py -dg BD -m TecoGAN/TecoGAN_VimeoTecoGAN_4xSR_2GPU -ds Vid4

Note that the validation results are NOT exactly the same as the testing results mentioned above due to different implementation of the metrics. The differences are caused by croping policy, LPIPS version and some other issues.

Benchmark

[1] FLOPs & speed are computed on RGB sequence with resolution 134*320 on a single NVIDIA 1080Ti GPU.
[2] Both FRVSR & TecoGAN use 10 residual blocks, while TecoGAN+ has 16 residual blocks.

License & Citation

If you use this code for your research, please cite the following paper and our project.

@article{tecogan2020,
  title={Learning temporal coherence via self-supervision for GAN-based video generation},
  author={Chu, Mengyu and Xie, You and Mayer, Jonas and Leal-Taix{\'e}, Laura and Thuerey, Nils},
  journal={ACM Transactions on Graphics (TOG)},
  volume={39},
  number={4},
  pages={75--1},
  year={2020},
  publisher={ACM New York, NY, USA}
}
@misc{tecogan_pytorch,
  author={Deng, Jianing and Zhuo, Cheng},
  title={PyTorch Implementation of Temporally Coherent GAN (TecoGAN) for Video Super-Resolution},
  howpublished="\url{https://github.com/skycrapers/TecoGAN-PyTorch}",
  year={2020},
}

Acknowledgements

This code is built on TecoGAN-TensorFlow, BasicSR and LPIPS. We thank the authors for sharing their codes.

If you have any questions, feel free to email me jn.deng@foxmail.com

About

A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

License:Apache License 2.0


Languages

Language:Python 95.8%Language:Shell 3.5%Language:MATLAB 0.7%