FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Created by Jinglin Xu, Yijie Guo, Yuxin Peng

This repository contains the PyTorch implementation for FinePOSE. (CVPR 2024, Highlight)

Overview

Dependencies

Make sure you have the following dependencies installed (python):

pytorch >= 0.4.0
matplotlib=3.1.0
einops
timm
tensorboard
CLIP

pip install git+https://github.com/openai/CLIP.git

You should download MATLAB if you want to evaluate our model on MPI-INF-3DHP dataset.

Datasets

Our model is evaluated on Human3.6M and MPI-INF-3DHP datasets.

Human3.6M

We set up the Human3.6M dataset in the same way as VideoPose3D. You can download the processed data from here. data_2d_h36m_gt.npz is the ground truth of 2D keypoints. data_2d_h36m_cpn_ft_h36m_dbb.npz is the 2D keypoints obatined by CPN. data_3d_h36m.npz is the ground truth of 3D human joints. Put them in the ./data directory.

MPI-INF-3DHP

We set up the MPI-INF-3DHP dataset following P-STMO. However, our training/testing data is different from theirs. They train and evaluate on 3D poses scaled to the height of the universal skeleton used by Human3.6M (officially called "univ_annot3"), while we use the ground truth 3D poses (officially called "annot3"). The former does not guarantee that the reprojection (used by the proposed JPMA) of the rescaled 3D poses is consistent with the 2D inputs, while the latter does. You can download our processed data from here. Put them in the ./data directory.

Human3.6M

To evaluate our FinePOSE with JPMA using the 2D keypoints obtained by CPN as inputs, please run:

python main.py -k cpn_ft_h36m_dbb -c checkpoint/model_h36m -gpu 0,1 --nolog --evaluate best_epoch_20_10.bin -num_proposals 20 -sampling_timesteps 10 -b 4

MPI-INF-3DHP

To evaluate our FinePOSE with JPMA using the ground truth 2D poses as inputs, please run:

python main_3dhp.py -c checkpoint/model_3dhp -gpu 0,1 --nolog --evaluate best_epoch_20_10.bin -num_proposals 20 -sampling_timesteps 10 -b 4

After that, the predicted 3D poses under P-Best, P-Agg, J-Best, J-Agg settings are saved as four files (.mat) in ./checkpoint. To get the MPJPE, AUC, PCK metrics, you can evaluate the predictions by running a Matlab script ./3dhp_test/test_util/mpii_test_predictions_ori_py.m (you can change 'aggregation_mode' in line 29 to get results under different settings). Then, the evaluation results are saved in ./3dhp_test/test_util/mpii_3dhp_evaluation_sequencewise_ori_{setting name}_t{iteration index}.csv. You can manually average the three metrics in these files over six sequences to get the final results.

Training from scratch

Trained on 2*NVIDIA RTX 4090.

Human3.6M

To train our model using the 2D keypoints obtained by CPN as inputs, please run:

python main.py -k cpn_ft_h36m_dbb -c checkpoint/model_h36m -gpu 0,1 --nolog

MPI-INF-3DHP

To train our model using the ground truth 2D poses as inputs, please run:

python main_3dhp.py -c checkpoint/model_3dhp -gpu 0,1 --nolog

Pretrained Models

Baidu Netdisk

Acknowledgement

Our code refers to the following repositories.

We thank the authors for releasing their codes.

PKU-ICST-MIPL / FinePOSE_CVPR2024

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Overview

Dependencies

Datasets

Human3.6M

MPI-INF-3DHP

Human3.6M

MPI-INF-3DHP

Training from scratch

Human3.6M

MPI-INF-3DHP

Pretrained Models

Acknowledgement

About

Languages