mulns / PerVFI

Official code base of "Perception-Oriented Video Frame Interpolation via Asymmetric Blending" (CVPR 2024), also denoted as ''PerVFI''.

Home Page:https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Perception-Oriented_Video_Frame_Interpolation_via_Asymmetric_Blending_CVPR_2024_paper.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Perception-Oriented Video Frame Interpolation via Asymmetric Blending πŸ”—

Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng

In CVPR 2024

This repository represents the official implementation of the paper titled "Perception-Oriented Video Frame Interpolation via Asymmetric Blending", also denoted as "PerVFI".

Website Paper Hugging Face Model License

We present PerVFI, a novel paradigm for perception-oriented video frame interpolation.

  • Asymmetric synergistic blending scheme: reduce blurry and ghosting effects derived from unavoidable motion error.
  • Generative model as decoder: reconstruct results sampled from a distribution to resolve temporal supervision misalignment during training.
  • Future: network structure can be meticulously optimized to improve efficiency and performance in the future.

teaser teaser

πŸ“’ News

2024-6-13: Paper Accepted! . Release the inference code (this repository).

2024-6-1: Added arXiv version: .

∞ TODO

  • ❗ πŸ”œ Inference code for customized flow estimator.
  • ❗ Release VFI-Benchmark, a codebase reproducing all scores listed in paper (this will take some time, as we are committed to providing a modularized and detailed implementation).
  • πŸ”œ Google Colab demo.
  • πŸ”œ Online interactive demo.
  • Hugging Face Space (optional).
  • Add GIFs in page for better visualization.

πŸš€ Usage

We offer several ways to interact with PerVFI:

  1. Run the demo locally (requires a GPU and Anaconda, see Installation Guide). Local development instructions with this codebase are given below.
  2. Extended demo on Google Colab (coming soon).
  3. Online interactive demo (coming soon).

πŸ› οΈ Setup

The inference code was tested on:

  • Ubuntu 22.04 LTS, Python 3.10.12, CUDA 11.7, GeForce RTX 4090
  • MacOS 14.2, Python 3.10.12, M1 16G

πŸͺ§ A Note for Windows users

We recommend running the code in WSL2:

  1. Install WSL following installation guide.
  2. Install CUDA support for WSL following installation guide.
  3. Find your drives in /mnt/<drive letter>/; check WSL FAQ for more details. Navigate to the working directory of choice.

πŸ“¦ Repository

Clone the repository (requires git):

git clone https://github.com/mulns/PerVFI.git
cd PerVFI

πŸ’» Dependencies

We provide several ways to install the dependencies.

  1. Using Conda.

    Windows users: Install the Linux version into the WSL.

    After the installation, create the environment and install dependencies into it:

    conda env create -f environment.yaml
    conda activate pervfi
  2. Using pip: Alternatively, create a Python native virtual environment and install dependencies into it:

    python -m venv venv/pervfi
    source venv/pervfi/bin/activate
    pip install -r requirements.txt

Keep the environment activated before running the inference script. Activate the environment again after restarting the terminal session.

πŸƒ Testing on your video

πŸ“· Prepare video sequences

Place your video images in a directory, for example, under input/in-the-wild_example, and run the following inference command.

⬇ Download Checkpoints

Download pre-trained models and place them to folder checkpoints. This includes checkpoints for various optical flow estimators. You can choose one for simple use or all for comparison.

πŸš€ Run inference

The Default checkpoint is trained only using Vimeo90K dataset.

 python infer_video.py -m [OFE]+pervfi -data input -fps [OUT_FPS]

NOTE: OFE is a placeholder for optical flow estimator name. In this repo, we support RAFT, GMA, GMFlow. You can also use your preferred flow estimator (future feature). OUT_FPS is a placeholder for frame rate (default to 10) of output video (maybe save with images).

The Vb checkpoint (faster) replaces the normalizing flow-generator with a multi-scale decoder to achieve faster inference speed, though with a compromise in perceptual quality:

 python infer_video.py -m [OFE]+pervfi-vb -data input -fps [OUT_FPS]

You can find all results in output. Enjoy!

🦿 Evaluation on test datasets

Will be included in VFI-Benchmark (currently under crafting).

πŸ‹οΈ Training

Comming Soon~

✏️ Contributing

Please refer to this instruction.

πŸŽ“ Citation

Please cite our paper:

@InProceedings{Wu_2024_CVPR,
    author    = {Wu, Guangyang and Tao, Xin and Li, Changlin and Wang, Wenyi and Liu, Xiaohong and Zheng, Qingqing},
    title     = {Perception-Oriented Video Frame Interpolation via Asymmetric Blending},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {2753-2762}
}

🎫 License

This work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

By downloading and using the code and model you agree to the terms in the LICENSE.

License

About

Official code base of "Perception-Oriented Video Frame Interpolation via Asymmetric Blending" (CVPR 2024), also denoted as ''PerVFI''.

https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Perception-Oriented_Video_Frame_Interpolation_via_Asymmetric_Blending_CVPR_2024_paper.pdf

License:Apache License 2.0


Languages

Language:Python 100.0%