ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation (AAAI 2024)

Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao

Project Page | Paper

This is the official PyTorch implementation of paper "ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation"

Our model generates realistic dynamic videos from random noise or given scene videos based on given conditions. Currently, we support openpose keypoint, canny, depth and segment condition.

canny	segment	depth
A dog, comicbook style	A red jellyfish, pastel colours.	A horse under a blue sky.

pose	customized pose
The Astronaut, brown background	Ironman in the sea

Setup

To install the environments, use:

conda create -n tune-control python=3.10

check cuda version then install the corresponding pytorch package, note that we need pytorch==2.0.0

pip install -r requirements.txt
conda install xformers -c xformers

You may also need to download model checkpoints manually from hugging-face.

Usage

To run the code, use

accelerate launch --num_processes 1 conditionvideo.py --config="configs//config.yaml"

for video generation, change the configuration in config.yaml for different generation settings.

Citation

@misc{peng2023conditionvideo,
      title={ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation}, 
      author={Bo Peng and Xinyuan Chen and Yaohui Wang and Chaochao Lu and Yu Qiao},
      year={2023},
      eprint={2310.07697},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

Training-Free Condition-Guided Text-to-Video Generation

GNU General Public License v3.0

Languages

Language:Python 100.0%