ljzycmd / VDTR

Video Deblurring with Transformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VDTR: Video Deblurring with Transformer

Mingdeng Cao, Yanbo Fan, Yong Zhang, Jue Wang and Yujiu Yang.


[arXiv]

We propose Video Deblurring Transformer (VDTR), a simple yet effective model that takes advantage of the long-range and relation modeling characteristics of the Transformer for video deblurring. VDTR utilizes pure Transformer for both spatial and temporal modeling and obtains highly competitive performance on the popular video deblurring benchmarks.

VDTR surpasses CNN-based state-of-the-art methods more than 1.5dB PSNR with moderate computational costs:


Spatio-temporal learning is significant for video deblurring, which is dominated by convolution-based methods. This paper presents VDTR, an effective Transformer-based model that makes the first attempt to adapt the Transformer for video deblurring. VDTR exploits the superior long-range and relation modeling capabilities of Transformer for both spatial and temporal modeling. However, it is challenging to design an appropriate Transformer-based model for video deblurring due to the high computational costs for high-resolution spatial modeling and the misalignment across frames for temporal modeling. To address these problems, VDTR advocates performing attention within non-overlapping windows and exploiting the hierarchical structure for long-range dependencies modeling. For frame-level spatial modeling, we propose an encoder-decoder Transformer that utilizes multi-scale features for deblurring. For multi-frame temporal modeling, we adapt the Transformer to fuse multiple spatial features efficiently. Compared with CNN-based methods, the proposed method achieves highly competitive results on both synthetic and real-world video deblurring benchmarks, including DVD, GOPRO, REDS and BSD. We hope such a pure Transformer-based architecture can serve as a powerful alternative baseline for video deblurring and other video restoration tasks.

Model Architecture

Environment

  • Python 3.8
  • PyTorch >= 1.5
git clone  git clone https://github.com/ljzycmd/SimDeblur.git

# install the SimDeblur
cd SimDeblur
bash Install.sh

Quick Start

  1. Clone the codes of VDTR
git clone https://github.com/ljzycmd/VDTR.git
  1. Download and unzip the datasets

Then create the soft links of the datasets to the ./datasets folder.

  1. Run the training script
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=10086 train.py ./configs/vdtr/vdtr_dvd.yaml --gpus=4

the training logs are saved in ./workdir/*

  1. Run the testing script (single GPU, minimum requirement RTX 2080Ti)
python test.py ./configs/vdtr/vdtr_dvd.yaml $Checkpoint_path

the testing logs and frames are saved in ./workdir/*.

Pretrained checkpoints are listed:

Model Dataset Download
VDTR DVD Google drive
VDTR GOPRO Google drive
VDTR BSD-1ms Goodle drive
VDTR BSD-2ms Goodle drive
VDTR BSD-3ms Goodle drive

Experimental Results

VDTR achieves competitive PSNR and SSIM on both synthetic and real-world deblurring datasets.

Quantitative results on popular video deblurring datasets: DVD, GOPRO, REDS qualitative_comparison

Qualitative comparison to state-of-the-art video deblurring methods on GOPRO qualitative_comparison

Quantitative results on real-world video deblurring datasets: BSD

Qualitative comparison to state-of-the-art video deblurring methods on BSD

Citation

If the proposed model is useful for your research, please consider citing

@article{cao2022vdtr,
  title   = {VDTR: Video Deblurring with Transformer},
  author  = {Mingdeng Cao and Yanbo Fan and Yong Zhang and Jue Wang and Yujiu Yang},
  journal = {arXiv:2204.08023},
  year    = {2022}
}

About

Video Deblurring with Transformer


Languages

Language:Python 100.0%