splinter21 / IFRNet

IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation (CVPR 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation

The official PyTorch implementation of IFRNet (CVPR 2022).

Authors: Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, Jie Yang

Highlights

Existing flow-based frame interpolation methods almost all first estimate or model intermediate optical flow, and then use flow warped context features to synthesize target frame. However, they ignore the mutual promotion of intermediate optical flow and intermediate context feature. Also, their cascaded architecture can substantially increase the inference delay and model parameters, blocking them from lots of mobile and real-time applications. For the first time, we merge above separated flow estimation and context feature refinement into a single encoder-decoder based IFRNet for compactness and fast inference, where these two crucial elements can benefit from each other. Moreover, task-oriented flow distillation loss and feature space geometry consistency loss are newly proposed to promote intermediate motion estimation and intermediate feature reconstruction of IFRNet, respectively. Benchmark results demonstrate that our IFRNet not only achieves state-of-the-art VFI accuracy, but also enjoys fast inference speed and lightweight model size.

Preparation

  1. PyTorch >= 1.3.0
  2. Download training and test datasets: Vimeo90K, UCF101, SNU-FILM, Middlebury, GoPro and Adobe240.
  3. Set the right dataset path on your machine.

Download Pre-trained Models and Play with Demos

Figures from left to right are overlaid input frames, 2x and 8x video interpolation results respectively.

  1. Download our pre-trained models in this link, and then put file checkpoints into the root dir.

  2. Run the following scripts to generate 2x and 8x frame interpolation demos

$ python demo_2x.py
$ python demo_8x.py

Training on Vimeo90K Triplet Dataset for 2x Frame Interpolation

  1. First, run this script to generate optical flow pseudo label
$ python generate_flow.py
  1. Then, start training by executing one of the following commands with selected model
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet_L' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet_S' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5

Benchmarks for 2x Frame Interpolation

To test running time and model parameters, you can run

$ python benchmarks/speed_parameters.py

To test frame interpolation accuracy on Vimeo90K, UCF101 and SNU-FILM datasets, you can run

$ python benchmarks/Vimeo90K.py
$ python benchmarks/UCF101.py
$ python benchmarks/SNU_FILM.py

Quantitative Comparison for 2x Frame Interpolation

Proposed IFRNet achieves state-of-the-art frame interpolation accuracy with less inference time and computation complexity. We expect proposed single encoder-decoder joint refinement based IFRNet to be a useful component for many frame rate up-conversion and intermediate view synthesis systems.

Qualitative Comparison for 2x Frame Interpolation

Video comparison for 2x interpolation of methods using 2 input frames on SNU-FILM dataset.

Middlebury Benchmark

Results on the Middlebury online benchmark.

Training on GoPro Dataset for 8x Frame Interpolation

  1. Start training by executing one of the following commands with selected model
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet_L' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet_S' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5

Since inter-frame motion in 8x interpolation setting is relatively small, task-oriented flow distillation loss is omitted here. Due to the GoPro training set is a relatively small dataset, we suggest to use your specific datasets to train slow-motion generation for better results.

Quantitative Comparison for 8x Frame Interpolation

Qualitative Results on GoPro and Adobe240 Datasets for 8x Frame Interpolation

Each video has 9 frames, where the first and the last frames are input, and the middle 7 frames are predicted by IFRNet.

Citation

When using any parts of the Software or the Paper in your work, please cite the following paper:

@InProceedings{Kong_2022_CVPR, 
  author = {Kong, Lingtong and Jiang, Boyuan and Luo, Donghao and Chu, Wenqing and Huang, Xiaoming and Tai, Ying and Wang, Chengjie and Yang, Jie}, 
  title = {IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation}, 
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 
  year = {2022},
}

About

IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation (CVPR 2022)


Languages

Language:Python 100.0%