XVFI (ICCV2021, Oral)

This is the official repository of XVFI (eXtreme Video Frame Interpolation)

[ArXiv_ver.] [ICCV2021_ver.] [Supp.] [Demo(YouTube)] [Oral12mins(YouTube)] [Flowframes(GUI)] [Poster]

Last Update: 20211130 - We provide extended input sequences for X-TEST. Please refer to X4K1000FPS

We provide the training and test code along with the trained weights and the dataset (train+test) used for XVFI. If you find this repository useful, please consider citing our paper.

Examples of the VFI (x8 Multi-Frame Interpolation) results on X-TEST

The 4K@30fps input frames are interpolated to be 4K@240fps frames. All results are encoded at 30fps to be played as x8 slow motion and spatially down-scaled due to the limit of file sizes. All methods are trained on X-TRAIN.

X4K1000FPS
Requirements
Test
Test_Custom
Training
Collection_of_Visual_Results
Reference
Contact

X4K1000FPS

Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion.

Some examples of X4K1000FPS dataset, which are frames of 1000-fps and 4K-resolution. Our dataset contains the various scenes with extreme motions. (Displayed in spatiotemporally subsampled .gif files)

We provide our X4K1000FPS dataset which consists of X-TEST and X-TRAIN. Please refer to our main/suppl. paper for the details of the dataset. You can download the dataset from this dropbox link.

X-TEST consists of 15 video clips with 33-length of 4K-1000fps frames. It follows the below directory format:

├──── YOUR_DIR/
    ├──── test/
       ├──── Type1/
          ├──── TEST01/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── TEST02/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── ...
       ├──── ...

Extended version of X-TEST issue#9. As described in our paper, we assume that the number of input frames for VFI is fixed to 2 in X-TEST. However, for the VFI methods that require more than 2 input frames, we provide an extended version of X-TEST which contains 8 input frames (in a temporal distance of 32 frames) for each test seqeuence. The middle two adjacent frames among the 8 frames are the same input frames in the original X-TEST. To sort .png files properly by their file names, we added 1000 to the frame indices (e.g. '0000.png' and '0032.png' in the original version of X-TEST correspond to '1000.png' and '1032.png', respectively, in the extended version of X-TEST). Please note that the extended one consists of input frames only, without the ground truth intermediate frames ('1001.png'~'1031.png'). In addition, for the sequence 'TEST11_078_f4977', '1064.png', '1096.png' and '1128.png' are replicated frames since '1064.png' is the last frame of the raw video file. The extended version of X-TEST can be downloaded from the link.

X-TRAIN consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames. Each frame is the size of 768x768 cropped from 4K frame. It follows the below directory format:

├──── YOUR_DIR/
    ├──── train/
       ├──── 002/
          ├──── occ008.320/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── occ008.322/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── ...
       ├──── ...

After downloading the files from the link, decompress the encoded_test.tar.gz and encoded_train.tar.gz. The resulting .mp4 files can be decoded into .png files via running mp4_decoding.py. Please follow the instruction written in mp4_decoding.py.

Requirements

Our code is implemented using PyTorch1.7, and was tested under the following setting:

Python 3.7
PyTorch 1.7.1
CUDA 10.2
cuDNN 7.6.5
NVIDIA TITAN RTX GPU
Ubuntu 16.04 LTS

Caution: since there is "align_corners" option in "nn.functional.interpolate" and "nn.functional.grid_sample" in PyTorch1.7, we recommend you to follow our settings. Especially, if you use the other PyTorch versions, it may lead to yield a different performance.

Test

Quick Start for X-TEST (x8 Multi-Frame Interpolation as in Table 2)

Download the source codes in a directory of your choice <source_path>.
First download our X-TEST test dataset by following the above section 'X4K1000FPS'.
Download the pre-trained weights, which was trained by X-TRAIN, from this link to place in <source_path>/checkpoint_dir/XVFInet_X4K1000FPS_exp1.

XVFI
└── checkpoint_dir
   └── XVFInet_X4K1000FPS_exp1
       ├── XVFInet_X4K1000FPS_exp1_latest.pt

Run main.py with the following options in parse_args:

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8

==> It would yield (PSNR/SSIM/tOF) = (30.12/0.870/2.15).

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 3 --multiple 8

==> It would yield (PSNR/SSIM/tOF) = (28.86/0.858/2.67).

Description

After running with the above test option, you can get the result images in <source_path>/test_img_dir/XVFInet_X4K1000FPS_exp1, then obtain the PSNR/SSIM/tOF results per each test clip as "total_metrics.csv" in the same folder.
Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.

Quick Start for Vimeo90K (as in Fig. 8)

Download the source codes in a directory of your choice <source_path>.
First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in <source_path>/vimeo_triplet.

XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt

Download the pre-trained weights (XVFI-Net_v), which was trained by Vimeo90K, from this link to place in <source_path>/checkpoint_dir/XVFInet_Vimeo_exp1.

XVFI
└── checkpoint_dir
   └── XVFInet_Vimeo_exp1
       ├── XVFInet_Vimeo_exp1_latest.pt

Run main.py with the following options in parse_args:

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 2

==> It would yield PSNR = 35.07 on Vimeo90K.

Description

After running with the above test option, you can get the result images in <source_path>/test_img_dir/XVFInet_Vimeo_exp1.
There are certain code lines in front of the 'def main()' for a convenience when running with the Vimeo option.
The SSIM result of 0.9760 as in Fig. 8 was measured by matlab ssim function for a fair comparison after running the above guide because other SOTA methods did so. We also upload "compare_psnr_ssim.m" matlab file to obtain it.
~~It should be noted that there is a typo "S_trn and S_tst are set to 2" in the current version of XVFI paper, which should be modified to 1 (not 2), sorry for inconvenience.~~ -> Updated in the latest arXiv version.

Test_Custom

Quick Start for your own video data ('--custom_path') for any Multi-Frame Interpolation (x M)

Download the source codes in a directory of your choice <source_path>.
First prepare your own video datasets in <source_path>/custom_path by following a hierarchy as belows:

XVFI
└── custom_path
   ├── scene1
       ├── 'xxx.png'
       ├── ...
       └── 'xxx.png'
   ...
   
   ├── sceneN
       ├── 'xxxxx.png'
       ├── ...
       └── 'xxxxx.png'

Download the pre-trained weights trained on X-TRAIN or Vimeo90K as decribed above.
Run main.py with the following options in parse_args (ex) x8 Multi-Frame Interpolation):

# For the model trained on X-TRAIN
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 --custom_path './custom_path'

# For the model trained on Vimeo90K
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 8 --custom_path './custom_path'

Description

Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.
It only supports for '.png' format.
Since we can not cover diverse possibilites of naming rule for custom frames, please sort your own frames properly.

Training

Quick Start for X-TRAIN

Download the source codes in a directory of your choice <source_path>.
First download our X-TRAIN train/val/test datasets by following the above section 'X4K1000FPS' and place them as belows:

XVFI
└── X4K1000FPS
      ├──  train
          ├── 002
          ├── ...
          └── 172
      ├──  val
          ├── Type1
          ├── Type2
          ├── Type3
      ├──  test
          ├── Type1
          ├── Type2
          ├── Type3

Run main.py with the following options in parse_args:

python main.py --phase 'train' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_trn 3 --S_tst 5

Quick Start for Vimeo90K

Download the source codes in a directory of your choice <source_path>.
First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in <source_path>/vimeo_triplet.

XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt

Run main.py with the following options in parse_args:

python main.py --phase 'train' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_trn 1 --S_tst 1

Description

You can freely regulate other arguments in the parser of main.py, here

Collection_of_Visual_Results

We also provide all visual results (x8 Multi-Frame Interpolation) on X-TEST for an easier comparison as belows. Each zip file has about 1~1.5GB.
AdaCoF_o, AdaCoF_f, FeFlow_o, FeFlow_f, DAIN_o, DAIN_f, XVFI-Net (S_tst=3), XVFI-Net (S_tst=5)
The quantitative comparisons (Table2 and Figure5) are attached as belows for a reference. \

Reference

Hyeonjun Sim*, Jihyong Oh*, and Munchurl Kim "XVFI: eXtreme Video Frame Interpolation", In ICCV, 2021. (* equal contribution)

BibTeX

@inproceedings{sim2021xvfi,
  title={XVFI: eXtreme Video Frame Interpolation},
  author={Sim, Hyeonjun and Oh, Jihyong and Kim, Munchurl},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Contact

If you have any question, please send an email to either
[Hyeonjun Sim] - flhy5836@kaist.ac.kr or
[Jihyong Oh] - jhoh94@kaist.ac.kr.

License

The source codes and datasets can be freely used for research and education only. Any commercial use should get formal permission first.

JihyongOh / XVFI

XVFI (ICCV2021, Oral)

Examples of the VFI (x8 Multi-Frame Interpolation) results on X-TEST

Table of Contents

X4K1000FPS

Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion.

Requirements

Test

Quick Start for X-TEST (x8 Multi-Frame Interpolation as in Table 2)

Description

Quick Start for Vimeo90K (as in Fig. 8)

Description

Test_Custom

Quick Start for your own video data ('--custom_path') for any Multi-Frame Interpolation (x M)

Description

Training

Quick Start for X-TRAIN

Quick Start for Vimeo90K

Description

Collection_of_Visual_Results

Reference

Contact

License

About

Languages