UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

Arxiv | Demo | Presentation Video | Teaser Video | Poster | Conference archive

1. Getting started

Code we tested on NVIDIA GeForce RTX 2080 Ti (CUDA Version: 12.2).

conda create -n UnifiedGesture python==3.7
conda activate UnifiedGesture
pip install -r requirements.txt

2. Quick start

Download files such as pre-trained models from Google Drive or Baidu Netdisk.

Put the pre-trained models and data:

Diffusion model
VQVAE
Retargeting network
Test data (Trinity, ZEGGS)

to according folders.

Download WavLM model and put it to ./diffusion_latent/wavlm_cache

cd ./diffusion_latent/
python sample.py --config=./configs/all_data.yml --gpu 0 --save_dir='./result_quick_start/Trinity' --audio_path="../dataset/ZEGGS/all_speech/005_Neutral_4_x_1_0.npy" --model_path='./experiments/256_seed_6_aux_model001700000_reinforce_diffusion_onlydiff_gradnorm0.1_lr1e-7_max0_seed0/ckpt/diffusion_epoch_1.pt'

Optional:

If you want to use your own audio, please directly change the path of --audio_path to your own audio path such as --audio_path='../dataset/Trinity/audio/Recording_006.wav'
You can refer to generate_result() in sample.py to generate all the files rather than only one file.

You will get the generated motion in ./diffusion_latent/result_quick_start/Trinity/ folder with the name xxx_recon.npy, xxx_code.npy and xxx.npy.

Then select the target skeleton and decode the primal gesture:

cd ../retargeting/
python demo.py --target ZEGGS --input_file "../diffusion_latent/result_quick_start/Trinity/005_Neutral_4_x_1_0_minibatch_1080_[0, 0, 0, 0, 0, 3, 0]_123456_recon.npy" --ref_path './datasets/bvh2latent/ZEGGS/065_Speech_0_x_1_0.npy' --output_path '../result/inference/Trinity/' --cuda_device cuda:0

mkdir "../diffusion_latent/result_quick_start/ZEGGS/"
cp "../diffusion_latent/result_quick_start/Trinity/005_Neutral_4_x_1_0_minibatch_1080_[0, 0, 0, 0, 0, 3, 0]_123456_recon.npy" "../diffusion_latent/result_quick_start/ZEGGS/"
python demo.py --target Trinity --input_file "../diffusion_latent/result_quick_start/ZEGGS/005_Neutral_4_x_1_0_minibatch_1080_[0, 0, 0, 0, 0, 3, 0]_123456_recon.npy" --ref_path './datasets/bvh2latent/ZEGGS/065_Speech_0_x_1_0.npy' --output_path '../result/inference/Trinity/' --cuda_device cuda:0

And you will get 005_Neutral_4_x_1_0_minibatch_1080_[0, 0, 0, 0, 0, 3, 0]_123456_recon.bvh in "./result/inference/Trinity/" folder. You can ref DiffuseStyleGesture to use Blender to visualize the generated motion. The results are shown below, try the output with different skeletons.

0001-4320.1.mp4

Finally the problem of foot sliding can be partially dealt with using inverse kinematics.

cd ./datasets/
python process_bvh.py --step IK --source_path "../../result/inference/Trinity/" --ref_bvh "./Mixamo_new_2/ZEGGS/067_Speech_2_x_1_0.bvh"

You will get 005_Neutral_4_x_1_0_minibatch_1080_[0, 0, 0, 0, 0, 3, 0]_123456_recon_fix.bvh in the folder same as before. The results are shown below, orange indicates the result of IK optimization performed on the lower body. And you can try to modify the threshold for foot contact speed to strike a balance between foot sliding and smoothness.

0001-4320-2.mp4

3. Train your own model

Here we only use a small amount of data for illustration, please get all the data from Trinity and ZEGGS.

3.1 Data preparation

Place the data from step 2 in the corresponding folder.

python process_bvh.py --step Trinity --source_path "../../dataset/Trinity/" --save_path "./Trinity_ZEGGS/Trinity/"
python process_bvh.py --step ZEGGS --source_path "../../dataset/ZEGGS/clean/" --save_path "./Trinity_ZEGGS/ZEGGS/"
python process_bvh.py --step foot_contact --source_path "./Trinity_ZEGGS/Trinity/" --save_path "./Trinity_ZEGGS/Trinity_aux/"
python process_bvh.py --step foot_contact --source_path "./Trinity_ZEGGS/ZEGGS/" --save_path "./Trinity_ZEGGS/ZEGGS_aux/"
cd ../..
python process_audio.py

3.2 Training retargeting network

Change dataset_name = 'Mixamo_new_2' in the L7 of file ./retargeting/option_parser.py to dataset_name = 'Trinity_ZEGGS'

cd ./retargeting/
python datasets/preprocess.py
python train.py --save_dir=./my_model/ --cuda_device 'cuda:0'

The model will be saved in: ./my_model/models/

(Optional: change the epoch of model.load(epoch=16000) in line L73 of the ./eval_single_pair.py file to what you need.)

python demo.py --mode bvh2latent --save_dir ./my_model/

You will get latent result of the retargeting in the dataset ./datasets/Trinity_ZEGGS/bvh2upper_lower_root/.

3.3 Training VQVAE model

Data preparation to generate lmdb files:

python process_root_vel.py
python ./datasets/latent_to_lmdb.py --base_path ./datasets/Trinity_ZEGGS/bvh2upper_lower_root

You will get the lmdb files in the ./retargeting/datasets/Trinity_ZEGGS/bvh2upper_lower_root/lmdb_latent_vel/ folder.

cd ../codebook
python train.py --config=./configs/codebook.yml --train --gpu 0

The trained model is saved in: ./result/my_codebook/, Then the code for the upper body is generated.

python VisualizeCodebook.py --config=./configs/codebook.yml --train --gpu 0

3.4 Training diffusion model

cd ..
python process_code.py
python ./make_lmdb.py --base_path ./dataset/

You will get the lmdb files in the ./dataset/all_lmdb_aux/ folder.

Training the diffusion model:

cd ./diffusion_latent
python end2end.py --config=./configs/all_data.yml --gpu 1 --save_dir "./result/my_diffusion"

The trained diffusion model will be saved in: ./result/my_diffusion/

3.5 RL training

TBA

If the data.mbd file is too small (8KB), check for issues generating the lmdb file.
You can modify the code yourself to use BEAT, TWH, etc. This will not be demonstrated here.

Acknowledgments

We are grateful to

for making their code publicly available, which helped significantly in this work.

Citation

If you find this work useful, please cite the paper with the following bibtex:

@inproceedings{yang2023UnifiedGesture,
  title={UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons},
  author={Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, changpeng yang, Zonghong Dai},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  year={2023},
  doi={https://doi.org/10.1145/3581783.3612503}
}

If you have any questions, please contact us at yangsc21@mails.tsinghua.edu.cn or wangzl21@mails.tsinghua.edu.cn.

jh-gglabs / UnifiedGesture