caizhongang / SMPLer-X

Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation


Useful links

[Homepage]      [HuggingFace]      [arXiv]      [Video]      [MMHuman3D]


  • [2024-03-29] An updated version of SMPLer-X-H32 is released to fix camera estimation on 3DPW-like data.
  • [2024-02-29] HuggingFace demo is online!
  • [2023-10-23] Support visualization through SMPL-X mesh overlay and add inference docker.
  • [2023-10-02] arXiv preprint is online!
  • [2023-09-28] Homepage and Video are online!
  • [2023-07-19] Pretrained models are released.
  • [2023-06-15] Training and testing code is released.


001.gif 001.gif 001.gif
001.gif 001.gif 001.gif



conda create -n smplerx python=3.8 -y
conda activate smplerx
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
pip install mmcv-full==1.7.1 -f
pip install -r requirements.txt

# install mmpose
cd main/transformer_utils
pip install -v -e .
cd ../..

Docker Support (Early Stage)

docker pull wcwcw/smplerx_inference:v0.2
docker run  --gpus all -v <vid_input_folder>:/smplerx_inference/vid_input \
        -v <vid_output_folder>:/smplerx_inference/vid_output \
        wcwcw/smplerx_inference:v0.2 --vid <video_name>.mp4
# Currently any customization need to be applied to /smplerx_inference/smplerx/
  • We recently developed a docker for inference at docker hub.
  • This docker image uses SMPLer-X-H32 as inference baseline and was tested at RTX3090 & WSL2 (Ubuntu 20.04).

Pretrained Models

Model Backbone #Datasets #Inst. #Params MPE Download FPS
SMPLer-X-S32 ViT-S 32 4.5M 32M 82.6 model 36.17
SMPLer-X-B32 ViT-B 32 4.5M 103M 74.3 model 33.09
SMPLer-X-L32 ViT-L 32 4.5M 327M 66.2 model 24.44
SMPLer-X-H32 ViT-H 32 4.5M 662M 63.0 model 17.47
SMPLer-X-H32* ViT-H 32 4.5M 662M 59.7 model 17.47
  • MPE (Mean Primary Error): the average of the primary errors on five benchmarks (AGORA, EgoBody, UBody, 3DPW, and EHF)
  • FPS (Frames Per Second): the average inference speed on a single Tesla V100 GPU, batch size = 1
  • SMPLer-X-H32* is the updated version of SMPLer-X-H32, which fixes the camera estimation issue on 3DPW-like data.


The file structure should be like:

├── common/
│   └── utils/
│       └── human_model_files/  # body model
│           ├── smpl/
│           │   ├──SMPL_NEUTRAL.pkl
│           │   ├──SMPL_MALE.pkl
│           │   └──SMPL_FEMALE.pkl
│           └── smplx/
│               ├──MANO_SMPLX_vertex_ids.pkl
│               ├──SMPL-X__FLAME_vertex_ids.npy
│               ├──SMPLX_NEUTRAL.pkl
│               ├──SMPLX_to_J14.pkl
│               ├──SMPLX_NEUTRAL.npz
│               ├──SMPLX_MALE.npz
│               └──SMPLX_FEMALE.npz
├── data/
├── main/
├── demo/  
│   ├── videos/       
│   ├── images/      
│   └── results/ 
├── pretrained_models/  # pretrained ViT-Pose, SMPLer_X and mmdet models
│   ├── mmdet/
│   │   ├──faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
│   │   └──
│   ├── smpler_x_s32.pth.tar
│   ├── smpler_x_b32.pth.tar
│   ├── smpler_x_l32.pth.tar
│   ├── smpler_x_h32.pth.tar
│   ├── vitpose_small.pth
│   ├── vitpose_base.pth
│   ├── vitpose_large.pth
│   └── vitpose_huge.pth
└── dataset/  
    ├── AGORA/       
    ├── ARCTIC/      
    ├── BEDLAM/      
    ├── Behave/      
    ├── CHI3D/       
    ├── CrowdPose/   
    ├── EgoBody/     
    ├── EHF/         
    ├── FIT3D/                
    ├── GTA_Human2/           
    ├── Human36M/             
    ├── HumanSC3D/            
    ├── InstaVariety/         
    ├── LSPET/                
    ├── MPII/                 
    ├── MPI_INF_3DHP/         
    ├── MSCOCO/               
    ├── MTP/                    
    ├── MuCo/                   
    ├── OCHuman/                
    ├── PoseTrack/                
    ├── PROX/                   
    ├── PW3D/                   
    ├── RenBody/
    ├── RICH/
    ├── SPEC/
    ├── SSP3D/
    ├── SynBody/
    ├── Talkshow/
    ├── UBody/
    ├── UP3D/
    └── preprocessed_datasets/  # HumanData files


  • Place the video for inference under SMPLer-X/demo/videos
  • Prepare the pretrained models to be used for inference under SMPLer-X/pretrained_models
  • Prepare the mmdet pretrained model and config under SMPLer-X/pretrained_models
  • Inference output will be saved in SMPLer-X/demo/results
cd main

# For inferencing test_video.mp4 (24FPS) with smpler_x_h32
sh test_video mp4 24 smpler_x_h32

2D Smplx Overlay

We provide a lightweight visualization script for mesh overlay based on pyrender.

  • Use ffmpeg to split video into images
  • The visualization script takes inference results (see above) as the input.
ffmpeg -i {VIDEO_FILE} -f image2 -vf fps=30 \
        {SMPLERX INFERENCE DIR}/{VIDEO NAME (no extension)}/orig_img/%06d.jpg \
        -hide_banner  -loglevel error

cd main && python \
            --data_path {SMPLERX INFERENCE DIR} --seq {VIDEO NAME} \
            --image_path {SMPLERX INFERENCE DIR}/{VIDEO NAME} \
            --render_biggest_person False


cd main

# For training SMPLer-X-H32 with 16 GPUS
sh smpler_x_h32 16
  • CONFIG_FILE is the file name under SMPLer-X/main/config
  • Logs and checkpoints will be saved to SMPLer-X/output/train_{JOB_NAME}_{DATE_TIME}


# To eval the model ../output/{TRAIN_OUTPUT_DIR}/model_dump/snapshot_{CKPT_ID}.pth.tar 
# with confing ../output/{TRAIN_OUTPUT_DIR}/code/
cd main
  • NUM_GPU = 1 is recommended for testing
  • Logs and results will be saved to SMPLer-X/output/test_{JOB_NAME}_ep{CKPT_ID}_{TEST_DATSET}


  • RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.

    Follow this post and modify torchgeometry

  • KeyError: 'SinePositionalEncoding is already registered in position encoding' or any other similar KeyErrors due to duplicate module registration.

    Manually add force=True to respective module registration under main/transformer_utils/mmpose/models/utils, e.g. @POSITIONAL_ENCODING.register_module(force=True) in this file

  • How do I animate my virtual characters with SMPLer-X output (like that in the demo video)?

    • We are working on that, please stay tuned! Currently, this repo supports SMPL-X estimation and a simple visualization (overlay of SMPL-X vertices).



Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"



Language:Python 92.0%Language:Cuda 3.8%Language:C++ 2.2%Language:C 1.7%Language:Shell 0.3%