FMFormer: Frame-padded Multi-scale Transformer for Monocular 3D Human Pose Estimation

This work is based on the VideoPose3D and MixSTE, and you can get more help there.

Test on Human3.6M

Environment

The code is conducted under the following environment:

The dataset setting follow the VideoPose3D. Please refer to it to set up the Human3.6M dataset (under ./data directory).

Then run the command below (evaluate on 243 frames input):

python run.py -k cpn_ft_h36m_dbb -c <checkpoint_path> --evaluate <checkpoint_file> -f 243 -s 243 --edgepad 81

Training FMFormer with GPUs:

python run.py -k cpn_ft_h36m_dbb -f 243 -s 243 --edgepad 81 -l log/run -c checkpoint -gpu 0,1

Thanks for the baselines, we construct the code based on them:

The Feature-padded Multi-scale Transformer for Monocular 3D Human Pose Estimation

Language:Python 100.0%