Please follow the instruction here.
The best format to use for 2D and 3D dataset should be MPI-IDF-3DHF format, which is structured as:
root
|__annotations
| |___cameras_test.pkl
| |___cameras_train.pkl
| |___joint2d_rel_stats.pkl
| |___joint2d_stats.pkl
| |___joint3d_rel_stats.pkl
| |___joint3d_stats.pkl
| |___mpi_inf_3dhp_test_valid.npz
| |___mpi_inf_3dhp_train.npz
|
|__images # name format for only 1 camera and 1 subject
|__S1_Seq1_Cam0_000000.jpg
|__S1_Seq1_Cam0_000001.jpg
|__...
Where each file has the following format:
cameras_test.pkl
{
{
'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
'w': width, # image width (int)
'h': height, # image height (int)
'name': 'test_cam_1'
},
{
'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
'w': width, # image width (int)
'h': height, # image height (int)
'name': 'test_cam_2'
},
...
}
cameras_train.pkl
{
{
'R': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 3 (shape 3x3)
'T': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 1 (shape 3x1)
'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
'w': width, # int
'h': height, # int
'name': 'test_cam_1'
},
{
'R': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 3 (shape 3x3)
'T': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 1 (shape 3x1)
'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
'w': width, # int
'h': height, # int
'name': 'test_cam_2'
},
...
}
joint2d_rel_stats.pkl
(coordinate relative to root)
{
'mean': array shape of (num_joints x 2), # mean of joints coordinates
'std': array shape of (num_joints x 2) # std of joints coordinates
}
joint2d_stats.pkl
{
'mean': array shape of (num_joints x 2), # mean of joints coordinates
'std': array shape of (num_joints x 2) # std of joints coordinates
}
joint3d_rel_stats.pkl
(coordinate relative to root)
{
'mean': array shape of (num_joints x 3), # mean of joints coordinates
'std': array shape of (num_joints x 3) # std of joints coordinates
}
joint3d_stats.pkl
{
'mean': array shape of (num_joints x 3), # mean of joints coordinates
'std': array shape of (num_joints x 3) # std of joints coordinates
}
mpi_inf_3dhp_test_valid.npz
and mpi_inf_3dhp_train.npz
{
'imgname': [], # list of .jpg images
'center': [], # list of bbox centers with shape (num_imgs x 2)
'scale': [], # list of Scale factor of bounding box with len num_imgs
'part': [[[]]], # list of join 2d with shape (num_imgs x num_joins x 3) (include confidence score ?)
'S': [[[]]] # list of join 3d with shape (num_imgs x num_joins x 3) (include confidence score ?)
}
If you want to use the original MPI-IDF-3DHF dataset, download the dataset and prepare the data folder as:
data_root
|-- train
|-- S1
|-- Seq1
|-- Seq2
|-- S2
|-- ...
|-- test
|-- TS1
|-- TS2
|-- ...
Run the following script to prepare the dataset:
python tools/dataset/preprocess_mpi_inf_3dhp.py --data_root {path to data root} --out_dir {path to out dir}
Please prepare the dataset following the instructure here. Some main steps are:
- Adding dataset info in
configs/_base_/datasets/{custom_dataset_name}.py
- Adding dataset config in
mmpose/datasets/datasets/datasets/custom_datset/{custom_dataset_name}.py
- Registering dataset name in the above dataset config file
- Set the
dataset_name
variable in dataset info file by the name of the dataset class in config file
In train config file, (e.x: configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py
), replace the dataset_type
to the dataset class name and data_root
to path to data directory.
Also replace camera param file and annotation paths in train_data_cfg
, test_data_cfg
and data
.
Training script:
python tools/train.py {path/to/train/config/file} --work-dir {path/to/save/output/dir} --gpu-id 0
For default, the model will be evaluted every epoch, and the video_pose_lift
model only need bbox width and height information in camera annotation files.
Evaluating script:
python tools/test.py {path to config file} {path to model ckpt} --work-dir {out dir}
Where: (e.x)
config file
: /home/ducanh/hain/code/mmpose_3d_pose_estimation/configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py
model ckpt
: get model ckpt in /home/ducanh/hain/code/mmpose_3d_pose_estimation/configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp.yml
Inferencing script:
python demo/body3d_two_stage_video_demo.py demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py
pretrained_weights/videopose_mpi-inf-3dhp_1frame_fullconv_supervised_gt-d6ed21ef_20210603.pth --video_path /home/ducanh/hain/dataset/yoga_15s.mp4 --out-video-root vis_result --rebase-keypoint-height
Where the args params represents config file and model ckpt for: 2d bounding boxes detection
, 2d keypoints detection
and 3d keypoints detection
respectively.
📘Documentation | 🛠️Installation | 👀Model Zoo | 📜Papers | 🆕Update News | 🤔Reporting Issues
English | 简体中文
MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project.
The master branch works with PyTorch 1.5+.
mmpose.demo.mp4
Major Features
-
Support diverse tasks
We support a wide spectrum of mainstream pose analysis tasks in current research community, including 2d multi-person human pose estimation, 2d hand pose estimation, 2d face landmark detection, 133 keypoint whole-body human pose estimation, 3d human mesh recovery, fashion landmark detection and animal pose estimation. See demo.md for more information.
-
Higher efficiency and higher accuracy
MMPose implements multiple state-of-the-art (SOTA) deep learning models, including both top-down & bottom-up approaches. We achieve faster training speed and higher accuracy than other popular codebases, such as HRNet. See benchmark.md for more information.
-
Support for various datasets
The toolbox directly supports multiple popular and representative datasets, COCO, AIC, MPII, MPII-TRB, OCHuman etc. See data_preparation.md for more information.
-
Well designed, tested and documented
We decompose MMPose into different components and one can easily construct a customized pose estimation framework by combining different modules. We provide detailed documentation and API reference, as well as unittests.
- 2022-07-06: MMPose v0.28.0 is released. Major updates include:
- Support TCFormer (CVPR'2022). See the model page
- Add RLE pre-trained model on COCO dataset. See the model page
- Update Swin models with better performance
- 2022-02-28: MMPose model deployment is supported by MMDeploy v0.3.0 MMPose Webcam API is a simple yet powerful tool to develop interactive webcam applications with MMPose features.
- 2021-12-29: OpenMMLab Open Platform is online! Try our pose estimation demo
MMPose depends on PyTorch and MMCV. Below are quick steps for installation. Please refer to install.md for detailed installation guide.
conda create -n openmmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate openmmlab
pip3 install openmim
mim install mmcv-full
git clone https://github.com/open-mmlab/mmpose.git
cd mmpose
pip3 install -e .
Please see get_started.md for the basic usage of MMPose. There are also tutorials:
- learn about configs
- finetune model
- add new dataset
- customize data pipelines
- add new modules
- export a model to ONNX
- customize runtime settings
Results and models are available in the README.md of each method's config directory. A summary can be found in the Model Zoo page.
Supported algorithms:
- DeepPose (CVPR'2014)
- CPM (CVPR'2016)
- Hourglass (ECCV'2016)
- SimpleBaseline3D (ICCV'2017)
- Associative Embedding (NeurIPS'2017)
- HMR (CVPR'2018)
- SimpleBaseline2D (ECCV'2018)
- HRNet (CVPR'2019)
- VideoPose3D (CVPR'2019)
- HRNetv2 (TPAMI'2019)
- MSPN (ArXiv'2019)
- SCNet (CVPR'2020)
- HigherHRNet (CVPR'2020)
- RSN (ECCV'2020)
- InterNet (ECCV'2020)
- VoxelPose (ECCV'2020)
- LiteHRNet (CVPR'2021)
- ViPNAS (CVPR'2021)
Supported techniques:
- FPN (CVPR'2017)
- FP16 (ArXiv'2017)
- Wingloss (CVPR'2018)
- AdaptiveWingloss (ICCV'2019)
- DarkPose (CVPR'2020)
- UDP (CVPR'2020)
- Albumentations (Information'2020)
- SoftWingloss (TIP'2021)
- SmoothNet (arXiv'2021)
- RLE (ICCV'2021)
Supported datasets:
- AFLW [homepage] (ICCVW'2011)
- sub-JHMDB [homepage] (ICCV'2013)
- COFW [homepage] (ICCV'2013)
- MPII [homepage] (CVPR'2014)
- Human3.6M [homepage] (TPAMI'2014)
- COCO [homepage] (ECCV'2014)
- CMU Panoptic [homepage] (ICCV'2015)
- DeepFashion [homepage] (CVPR'2016)
- 300W [homepage] (IMAVIS'2016)
- RHD [homepage] (ICCV'2017)
- CMU Panoptic HandDB [homepage] (CVPR'2017)
- AI Challenger [homepage] (ArXiv'2017)
- MHP [homepage] (ACM MM'2018)
- WFLW [homepage] (CVPR'2018)
- PoseTrack18 [homepage] (CVPR'2018)
- OCHuman [homepage] (CVPR'2019)
- CrowdPose [homepage] (CVPR'2019)
- MPII-TRB [homepage] (ICCV'2019)
- FreiHand [homepage] (ICCV'2019)
- Animal-Pose [homepage] (ICCV'2019)
- OneHand10K [homepage] (TCSVT'2019)
- Vinegar Fly [homepage] (Nature Methods'2019)
- Desert Locust [homepage] (Elife'2019)
- Grévy’s Zebra [homepage] (Elife'2019)
- ATRW [homepage] (ACM MM'2020)
- Halpe [homepage] (CVPR'2020)
- COCO-WholeBody [homepage] (ECCV'2020)
- MacaquePose [homepage] (bioRxiv'2020)
- InterHand2.6M [homepage] (ECCV'2020)
- AP-10K [homepage] (NeurIPS'2021)
- Horse-10 [homepage] (WACV'2021)
Supported backbones:
- AlexNet (NeurIPS'2012)
- VGG (ICLR'2015)
- ResNet (CVPR'2016)
- ResNext (CVPR'2017)
- SEResNet (CVPR'2018)
- ShufflenetV1 (CVPR'2018)
- ShufflenetV2 (ECCV'2018)
- MobilenetV2 (CVPR'2018)
- ResNetV1D (CVPR'2019)
- ResNeSt (ArXiv'2020)
- Swin (CVPR'2021)
- HRFormer (NIPS'2021)
- PVT (ICCV'2021)
- PVTV2 (CVMJ'2022)
We will keep up with the latest progress of the community, and support more popular algorithms and frameworks. If you have any feature requests, please feel free to leave a comment in MMPose Roadmap.
MMPose achieves superior of training speed and accuracy on the standard keypoint detection benchmarks like COCO. See more details at benchmark.md.
We summarize the model complexity and inference speed of major models in MMPose, including FLOPs, parameter counts and inference speeds on both CPU and GPU devices with different batch sizes. Please refer to inference_speed_summary.md for more details.
Please refer to data_preparation.md for a general knowledge of data preparation.
Please refer to FAQ for frequently asked questions.
We appreciate all contributions to improve MMPose. Please refer to CONTRIBUTING.md for the contributing guideline.
MMPose is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new models.
If you find this project useful in your research, please consider cite:
@misc{mmpose2020,
title={OpenMMLab Pose Estimation Toolbox and Benchmark},
author={MMPose Contributors},
howpublished = {\url{https://github.com/open-mmlab/mmpose}},
year={2020}
}
This project is released under the Apache 2.0 license.
- MMCV: OpenMMLab foundational library for computer vision.
- MIM: MIM installs OpenMMLab packages.
- MMClassification: OpenMMLab image classification toolbox and benchmark.
- MMDetection: OpenMMLab detection toolbox and benchmark.
- MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
- MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
- MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
- MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
- MMPose: OpenMMLab pose estimation toolbox and benchmark.
- MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
- MMSelfSup: OpenMMLab self-supervised learning toolbox and benchmark.
- MMRazor: OpenMMLab model compression toolbox and benchmark.
- MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
- MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
- MMTracking: OpenMMLab video perception toolbox and benchmark.
- MMFlow: OpenMMLab optical flow toolbox and benchmark.
- MMEditing: OpenMMLab image and video editing toolbox.
- MMGeneration: OpenMMLab image and video generative models toolbox.
- MMDeploy: OpenMMLab Model Deployment Framework.