Interactive Image2Video-Synthesis

Official Pytorch Implementation of our CVPR21 paper Understanding Object Dynamics for Interactive Image-to-Video Synthesis, where we enable human users to interact with still images.

Arxiv | Project page | BibTeX

Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer, CVPR 2021

TL;DR We introduce the novel problem of Interactive Image-to-Video Synthesis where we learn to understand the relations between the distinct body parts of articulated objects from unlabeled video data. Our proposed model allows for synthesis of videos showing natural object dynamics as responses to targeted, local interactions.and, thus, enables human users to interact with still images by poking pixels.

Requirements
Data preparation
Pretrained Models
Train your own II2V model
BibTeX

Requirements

A suitable conda environment named ii2v can be created with

conda env create -f ii2v.yml 
conda activate ii2v

Data preparation

Get Flownet2 for optical flow estimation

As preparing the data to evaluate our pretrained models or train new ones requires to estimate optical flow maps, first add Flownet2 as a git submodule and place it in the directory models/flownet2 via

git submodule add https://github.com/NVIDIA/flownet2-pytorch models/flownet2

Since Flownet2 requires cuda-10.0 and is therefore not compatible with our main conda environment, we provide a separate conda enviroment for optical flow estimation which can bet created via

conda env create -f flownet2

You can activate the environment and specify the right cuda version by using

source activate_flownet2

from the root of this repository. IMPORTANT: You have to ensure that lines 3 and 4 in the script add your respective cuda-10.0 installation direcories to the PATH and LD_LIBRARY_PATH environment variables. Finally, you have to build the custom layers of flownet2 with

cd models/flownet2
bash install.sh -ccbin <PATH TO_GCC7>

, where <PATH TO_GCC7> is the path to your gcc-7-binary, which is usually /usr/bin/gcc-7 on a linux server. Make sure that your flownet2 environment is activated and that the env-variables contain the cuda-10.0 installation when running the script.

Poking Plants

Download Poking Plants dataset from here and extract it to a <TARGETDIR>, which then contains the raw video files. To extract the multi-zip file, use

zip -s 0 poking_plants.zip --out poking_plants_unsplit.zip
unzip poking_plants_unsplit.zip

To extract the individual frames and estimate optical flow set the value of the field raw_dir in config/data_preparation/plants.yaml to be <TARGETDIR>, define the target location for the extracted frames (, where all frames of each video will be within a unique directory) via the field processed_dir and run

source activate_flownet2
python -m utils.prepare_dataset --config config/data_preparation/plants.yaml

By defining the number of parallel runs of flownet2, which will be distributed among the gpus with the ids specified in target_gpus, with the num_workers-argument, you can significantly speed up the optical flow estimation.

iPER

Download the zipped videos in iPER_1024_video_release.zip from this website website (note that you have to create a microsoft account to get access) and extract the archive to a <TARGETDIR> similar to the above example. There, you'll also find the train.txt and val.txt. Download these files and save them in the <TARGETDIR> Again, set the undefined value of the field raw_dir in config/data_preparation/iper.yaml to be <TARGETDIR>, define the target location for the extracted frames and the optical flow via processed_dir and run

python -m utils.prepare_dataset --config config/data_preparation/iper.yaml

with the flownet2 environment activated.

Human3.6m

Firstly, you will need to create an account at the homepage of the Human3.6m dataset to gain access to the dataset. After your account is created and approved (takes a couple of hours), log in and inspect your cookies to find your PHPSESSID. Fill in that PHPSESSID in data/config.ini and also specify the TARGETDIR there, where the extracted videos will be later stored. After setting the field processed_dir in config/data_preparation/human36m.yaml, you can download and extract the videos via

python -m data.human36m_preprocess

with the flownet2 environment activated. Frame extraction and optical flow estimation are then done as usual with

python -m data.prepare_dataset --config config/data_preparation/human36m.yaml

TaiChi-HD

To download and extract the videos, follow the steps listed at the download page for this dataset and set the out_folder argument of the script load_videos.py to be our <TARGETDIR> from the above examples. Again set the fields raw_dir and processed_dir in config/data_preparation/taichi.yaml similar to the above examples and run

python -m data.prepare_dataset --config config/data_preparation/taichi.yaml

with the flownet2 environment activated to extract the individual frames and estimate the optical flow maps.

Pretrained models

Get the checkpoints

Here's a list of all available pretrained models. Note that the list will be updated soon, as we then also provide the pretrained models for the additional examples in the supplementary

Dataset	Video resolution	Link	FVD
Poking Plants	128 x 128	plants_128x128	174.18
Poking Plants	64 x 64	plants_64x64	89.76
iPER	128 x 128	iper_128x128	220.34
iPER	64 x 64	iper_64x64	144.92
Human3.6m	128 x 128	h36m_128x128	129.62
Human3.6m	64 x 64	h36m_64x64	119.89
TaiChi-HD	128 x 128	taichi_128x128	167.94
TaiChi-HD	64 x 64	taichi_64x64	182.28

Download the data to a <MODELDIR> by selecting all items visible under the respective link and clicking on the green 'ZIP Selected Items'. IMPORTANT: To ensure smooth and automatic evaluation, choose the name for the resulting zip-file to be the name of the respective link in the above table.

Evaluate pretrained models

All provided pretrained models can be evaluated with the command

conda activate ii2v
python -m utils.eval_pretrained --base_dir <MODELDIR> --mode <[metrics,fvd]> --gpu <GPUID>

, where --mode fvd will extract samples for calculating the FVD score (for details on its calculation see below) and save them in <MODELDIR>/<NAME OF LINK IN TABLE>/generated/samples_fvd and --mode metrics will evaluate the model wrt. the remaining metrics which we reported in the paper.

FVD evaluation

As the FVD implementation requires tensorflow<=1.15, we again created a separate conda environment to evaluate the models wrt. the this score, which can be initialized and activated by using

conda env create -f environement_fvd.yml
conda activate fvd

You can calculate the FVD-score of a model with

python -m utils.metric_fvd --gpu <GPUID> --source <MODELDIR>/<NAME OF LINK IN TABLE>/generated/samples_fvd

Note that the samples have to be written to <MODELDIR>/<NAME OF LINK IN TABLE>/generated/samples_fvd when running the script.

Train your own II2V model

To train your own model on one of the provided datasets, you'll have to adapt the fields

base_dir : The base directory where all logs, config-files, checkpoints and results will be stored (we recommend not to change this once you've defined it)
dataset : The considered dataset, shall be in ['PlantDataset, IperDataset, Human36mDataset, TaichiDataset]
datapath: <TARGETDIR> from above for the respective dataset

in the config file config/fixed_length_model.yaml.

After that, you can start training by running

python main.py --config config/fixed_length_model.yaml --project_name <UNIQUE_PROJECT_NAME> --gpu <GPUID> --mode <[train, test]>.

To evaluate the model after training, run

python -m utils.eval_models --base_dir <base_dir field from the respective config> --mode <[metrics,fvd]> --gpu <GPUID>

BibTeX

@InProceedings{Blattmann_2021_CVPR,
    author    = {Blattmann, Andreas and Milbich, Timo and Dorkenwald, Michael and Ommer, Bjorn},
    title     = {Understanding Object Dynamics for Interactive Image-to-Video Synthesis},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {5171-5181}
}

CompVis / interactive-image2video-synthesis