VideoEditGAN

This is the repo for ECCV'22 paper, "Temporally Consistent Semantic Video Editing".

Updates

09/25/2022: added example code.
07/16/2022: repo initialized.

Prerequisites

Linux
Anaconda/Miniconda
Python 3.6 (tested on Python 3.6.7)
PyTorch
CUDA enabled GPU

Install packages:

conda env create -f environment.yml

Get started

Let's use examples/aamir_khan_clip.mp4 as an example.

Split a video to frames:

python scripts/vid2frame.py --pathIn examples/aamir_khan_clip.mp4 --pathOut out/aamir_khan/frames

Face alignment. We use 3DDFA_V2 for face alignment. First, clone 3DDFA_V2 to folder:

git clone https://github.com/cleardusk/3DDFA_V2.git
cd 3DDFA_V2

Then, install the dependency following the instructions, and build the cython

sh ./build.sh

We provide a code snippet single_video_smooth.py to generate facial landmarks for the alignment. Run

cp ../scrpits/single_video_smooth.py ./
python single_video_smooth.py -f out/aamir_khan/frames

The landmarks.npy will be saved at path-to-video/../landmarks/landmarks.npy.

Then we can transform the faces using detected landmarks.

cd ../
python scripts/align_faces_parallel.py --num_threads 1 --root_path out/aamir_khan/frames --output_path out/aamir_khan/aligned

We then run a naive unalignment to see if the alignment makes sense. This will also provide the parameters for the post-processing.

python scripts/unalign.py --ori_images_path out/aamir_khan/frames --aligned_images_path out/aamir_khan/aligned --output_path out/aamir_khan/unaligned

GAN inversion

For in-domain editing, we use PTI to do the inversion. We have included PTI in this repo. To use it, download pre-trained models and put them in PTI/pretrained_models/, then start the inversion (this will take a while):

cd PTI
python scripts/run_pti_multi.py --data_root ../out/aamir_khan/aligned --run_name aamir_khan --checkpoint_path ../out/aamir_khan/inverted

Direct editing

Here we use StyleCLIP mapper as an example. Download the pretrained mapper here, and put it into PTI/pretrained_models/. Then, run

python scripts/pti_styleclip.py --inverted_root ../out/aamir_khan/inverted --run_name aamir_khan_eyeglasses --aligned_frame_path ../out/aamir_khan/aligned --output_root ../out/aamir_khan/in_domain --use_multi_id_G

Our flow-based method Now that we have prepared everything, the next step is to run our proposed method.

Our method relies on RAFT, a flow estimator. Download the pretrained network here, and put raft-things.pth into VideoEditGAN/pretrained_models/.

Put pretrained mapper into VideoEditGAN/pretrained_models, for example

cd VideoEditGAN/pretrained_models
ln -s PTI/pretrained_models/eyeglasses.pt ./

Run our proposed method:

cd VideoEditGAN/
python -W ignore scripts/temp_consist.py --edit_root out/aamir_khan/in_domain --metadata_root out/aamir_khan/unaligned --original_root out/aamir_khan/frames --aligned_ori_frame_root out/aamir_khan/aligned --checkpoint_path out/aamir_khan/inverted --batch_size 1 --reg_frame 0.2 --weight_cycle 10.0 --weight_tv_flow 0.0 --lr 1e-3 --weight_photo 1.0 --reg_G 100.0 --lr_G 1e-04 --weight_out_mask 0.5 --weight_in_mask 0.0 --tune_w --epochs_w 10 --tune_G --epochs_G 3 --scale_factor 4 --in_domain --exp_name 'temp_consist' --run_name 'aamir_khan_eyeglasses'

Unalignment

As a final step, we run STIT as a post-processing to put the aligned face back to the input video.

python video_stitching_tuning_ours.py --input_folder ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/aligned_frames --output_folder ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/aligned_frames/stitiched --edit_name 'eyeglasses' --latent_code_path ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/variables.pth --gen_path out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/G.pth --metadata_path ../out/aamir_khan/unaligned --output_frames --num_steps 50

Citation

If you find the code useful, please consider citing our paper:

@article{xu2022videoeditgan,
        author    = {Xu, Yiran and AlBahar, Badour and Huang, Jia-Bin},
        title     = {Temporally consistent semantic video editing},
        journal   = {arXiv preprint arXiv: 2206.10590},
        year      = {2022},
        }

Acknowledgements

The codebase is heavily built upon prior work. We would like to thank

StyleGAN2-ada and rosinality's implementation
3DDFA
PTI
ReStyle
StyleCLIP
StyleGAN-NADA
STIT
RAFT

rnjtsh / VideoEditGAN