Deep Semantic Graph Transformer for Multi-view 3D Human Pose Estimation [AAAI 2024]

Deep Semantic Graph Transformer for Multi-view 3D Human Pose Estimation,
Lijun Zhang, Kangkang Zhou, Feng Lu, Xiang-Dong Zhou, Yu Shi,
The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024

TODO

The paper will be released soon!
Test code and model weights will be released soon!

Release

[14/12/2023] We released the model and training code for SGraFormer.

Installation

Create a conda environment: conda create -n SGraFormer python=3.7
Download cudatoolkit=11.0 from here and install
pip3 install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install -r requirements.txt

Dataset Setup

Please download the dataset from Human3.6M website and refer to VideoPose3D to set up the Human3.6M dataset ('./dataset' directory). Or you can download the processed data from here.

${POSE_ROOT}/
|-- dataset
|   |-- data_3d_h36m.npz
|   |-- data_2d_h36m_gt.npz
|   |-- data_2d_h36m_cpn_ft_h36m_dbb.npz

Quick Start

To train a model on Human3.6M:

python main.py --frames 27 --batch_size 1024 --nepoch 50 --lr 0.0002

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{
    The 38th Annual AAAI Conference on Artificial Intelligence (AAAI)
    author = {Lijun Zhang, Kangkang Zhou, Feng Lu, Xiang-Dong Zhou, Yu Shi},
    title = {Deep Semantic Graph Transformer for Multi-view 3D Human Pose Estimation},
    year = {2024},
    }

Acknowledgement

Our code is extended from the following repositories. We thank the authors for releasing the codes.

z0911k / SGraFormer