In this project, we provide another easier implement scheme (SVD_PoseGuider) for character animation, such as human dance generation. Different from AnimateAnyone, we adopt SVD (Stable Video Diffusion) as our base model, and inject the pose information to the unet through a temporal-controlnet. The whole training strategy is simplified from two-stages to one-stage.
It's worth noting that this project is a preliminary version, we will continue to develop it and welcome feedbacks and ideas from the community.
- [2024.2.5] The pretrained weights are released.
- [2024.2.5] We release the first vision of the code, containing both of the train and inference phases.
We plan to optimize the preliminary model in following ways:
- Collect more character videos to retrain the model.
- Optimize the aggregation method for generated clips.
- Add post-processing of human face.
Demo(UBC_Fashion#1) | Demo(UBC_Fashion#2) |
---|---|
SVD_PoseGuider | w_face_enhance |
---|---|
Demo(TikTok#1) | Demo(TikTok#2) |
---|---|
-
Computing resource requirements: This task needs the support of high memory GPUs. We train the model under the condition of A100*8 with 80g and inferecence on 1 v100 with 32g.
-
Installation:
We recommend python version >=3.10 and cuda version =11.7. Then install the packages as follows:
pip install -r requirements.txt
- Pretrained weights: Download the checkpoint from Link, and then place them under the directory of
./checkpoint
.
When the pretrained weights are ready, you can run the following cli command:
python inference.py \
--pretrained_model_name_or_path="stabilityai/stable-video-diffusion-img2vid-xt" \
--controlnet_path="./checkpoint" \
--validation_folder="./validation_demo" \
--output_dir="./results"
- Data prepartion: Download the UBC-fashion dataset and the Tiktok-dataset. Then process the dataset with the guidance of Open-AnimateAnyone.
- Run command: Update the corresponding paths in run.sh and then run:
sh run.sh
- Codebase built upon SVD_Xtend and svd-temporal-controlnet