Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

This is the official implementation of the approach described in the paper:

Wenhao Li, Hong Liu, Runwei Ding, Mengyuan Liu, Pichao Wang, and Wenming Yang. Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation. IEEE Transactions on Multimedia, 2022.

News

03/24/2022: Demo and in-the-wild inference code is released!
03/15/2022: Our method has been verified in self-supervised pre-training as a backbone network!

Dependencies

Cuda 11.1
Python 3.6
Pytorch 1.7.1

Dataset setup

Please download the dataset from Human3.6M website and refer to VideoPose3D to set up the Human3.6M dataset ('./dataset' directory). Or you can download the processed data from here.

${POSE_ROOT}/
|-- dataset
|   |-- data_3d_h36m.npz
|   |-- data_2d_h36m_gt.npz
|   |-- data_2d_h36m_cpn_ft_h36m_dbb.npz

Download pretrained model

The pretrained model can be found in here, please download it and put in the './checkpoint' dictory.

Test the model

To test on pretrained model on Human3.6M:

python main.py --test --refine --reload --refine_reload --previous_dir 'checkpoint/pretrained'

Train the model

To train on Human3.6M:

python main.py

After training for several epochs, add refine module:

python main.py --refine --lr 1e-5 --reload --previous_dir [your model saved path]

Demo

First, you need to download YOLOv3 and HRNet pretrained models here and put it in the './demo/lib/checkpoint' directory. Then, you need to put your in-the-wild videos in the './demo/video/' directory.

Run the command below:

python demo/vis.py --video sample_video.mp4

Sample demo output:

Citation

If you find our work useful in your research, please consider citing:

@article{li2022exploiting,
  title={Exploiting temporal contexts with strided transformer for 3d human pose estimation},
  author={Li, Wenhao and Liu, Hong and Ding, Runwei and Liu, Mengyuan and Wang, Pichao and Yang, Wenming},
  journal={IEEE Transactions on Multimedia},
  year={2022},
}

Acknowledgement

Our code is built on top of ST-GCN and is extended from the following repositories. We thank the authors for releasing the codes.

Licence

This project is licensed under the terms of the MIT license.

About

[TMM 2022] Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

MIT License

Languages

Language:Python 100.0%