Additional Manual for custom Training and Inference 3D Pose Estimation (SmartCube)

1. Setup environment and required libraries

Please follow the instruction here.

2. Dataset Preparation

2.1. Dataset Format

The best format to use for 2D and 3D dataset should be MPI-IDF-3DHF format, which is structured as:

root
  |__annotations
  |       |___cameras_test.pkl
  |       |___cameras_train.pkl
  |       |___joint2d_rel_stats.pkl
  |       |___joint2d_stats.pkl
  |       |___joint3d_rel_stats.pkl
  |       |___joint3d_stats.pkl
  |       |___mpi_inf_3dhp_test_valid.npz
  |       |___mpi_inf_3dhp_train.npz
  |
  |__images # name format for only 1 camera and 1 subject
        |__S1_Seq1_Cam0_000000.jpg
        |__S1_Seq1_Cam0_000001.jpg
        |__...

Where each file has the following format:

cameras_test.pkl

{
    {
        'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
        'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
        'w': width, # image width (int)
        'h': height, # image height (int)
        'name': 'test_cam_1'
    },
    {
        'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
        'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
        'w': width, # image width (int)
        'h': height, # image height (int)
        'name': 'test_cam_2'
    },
    ...
}

cameras_train.pkl

{
    {
        'R': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 3 (shape 3x3)
        'T': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 1 (shape 3x1)
        'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
        'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
        'w': width, # int
        'h': height, # int
        'name': 'test_cam_1'
    },
    {
        'R': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 3 (shape 3x3)
        'T': arrray[array[], array[], array[]] # rotation matrix, each inner array has len 1 (shape 3x1)
        'c': arrray[array[], array[]], # camera center, each inner array has len 1 (shape 2x1)
        'f': arrray[array[], array[]], # camera focal len, each inner array has len 1 (shape 2x1)
        'w': width, # int
        'h': height, # int
        'name': 'test_cam_2'
    },
    ...
}

joint2d_rel_stats.pkl (coordinate relative to root)

{
    'mean': array shape of (num_joints x 2), # mean of joints coordinates
    'std': array shape of (num_joints x 2) # std of joints coordinates
}

joint2d_stats.pkl

{
    'mean': array shape of (num_joints x 2), # mean of joints coordinates
    'std': array shape of (num_joints x 2) # std of joints coordinates
}

joint3d_rel_stats.pkl (coordinate relative to root)

{
    'mean': array shape of (num_joints x 3), # mean of joints coordinates
    'std': array shape of (num_joints x 3) # std of joints coordinates
}

joint3d_stats.pkl

{
    'mean': array shape of (num_joints x 3), # mean of joints coordinates
    'std': array shape of (num_joints x 3) # std of joints coordinates
}

mpi_inf_3dhp_test_valid.npz and mpi_inf_3dhp_train.npz

{
    'imgname': [], # list of .jpg images
    'center': [], # list of bbox centers with shape (num_imgs x 2)
    'scale': [], # list of Scale factor of bounding box with len num_imgs
    'part': [[[]]], # list of join 2d with shape (num_imgs x num_joins x 3) (include confidence score ?) 
    'S': [[[]]] # list of join 3d with shape (num_imgs x num_joins x 3) (include confidence score ?)
}

If you want to use the original MPI-IDF-3DHF dataset, download the dataset and prepare the data folder as:

data_root
    |-- train
        |-- S1
            |-- Seq1
            |-- Seq2
        |-- S2
        |-- ...
    |-- test
        |-- TS1
        |-- TS2
        |-- ...

Run the following script to prepare the dataset:

python tools/dataset/preprocess_mpi_inf_3dhp.py --data_root {path to data root} --out_dir {path to out dir}

2.2 Adding custom dataset

Please prepare the dataset following the instructure here. Some main steps are:

Adding dataset info in configs/_base_/datasets/{custom_dataset_name}.py
Adding dataset config in mmpose/datasets/datasets/datasets/custom_datset/{custom_dataset_name}.py
Registering dataset name in the above dataset config file
Set the dataset_name variable in dataset info file by the name of the dataset class in config file

3. Training and Inference

In train config file, (e.x: configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py), replace the dataset_type to the dataset class name and data_root to path to data directory.

Also replace camera param file and annotation paths in train_data_cfg, test_data_cfg and data.

Training script:

python tools/train.py {path/to/train/config/file} --work-dir {path/to/save/output/dir} --gpu-id 0

For default, the model will be evaluted every epoch, and the video_pose_lift model only need bbox width and height information in camera annotation files.

Evaluating script:

python tools/test.py {path to config file} {path to model ckpt} --work-dir {out dir}

Where: (e.x)

config file: /home/ducanh/hain/code/mmpose_3d_pose_estimation/configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py

model ckpt: get model ckpt in /home/ducanh/hain/code/mmpose_3d_pose_estimation/configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp.yml

Inferencing script:

python demo/body3d_two_stage_video_demo.py demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py
pretrained_weights/videopose_mpi-inf-3dhp_1frame_fullconv_supervised_gt-d6ed21ef_20210603.pth --video_path /home/ducanh/hain/dataset/yoga_15s.mp4 --out-video-root vis_result --rebase-keypoint-height

Where the args params represents config file and model ckpt for: 2d bounding boxes detection, 2d keypoints detection and 3d keypoints detection respectively.

Author's Tutorial

OpenMMLab website ^HOT OpenMMLab platform ^{TRY IT OUT}

English | 简体中文

Introduction

MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project.

The master branch works with PyTorch 1.5+.

mmpose.demo.mp4

Major Features

Support diverse tasks

We support a wide spectrum of mainstream pose analysis tasks in current research community, including 2d multi-person human pose estimation, 2d hand pose estimation, 2d face landmark detection, 133 keypoint whole-body human pose estimation, 3d human mesh recovery, fashion landmark detection and animal pose estimation. See demo.md for more information.
Higher efficiency and higher accuracy

MMPose implements multiple state-of-the-art (SOTA) deep learning models, including both top-down & bottom-up approaches. We achieve faster training speed and higher accuracy than other popular codebases, such as HRNet. See benchmark.md for more information.
Support for various datasets

The toolbox directly supports multiple popular and representative datasets, COCO, AIC, MPII, MPII-TRB, OCHuman etc. See data_preparation.md for more information.
Well designed, tested and documented

We decompose MMPose into different components and one can easily construct a customized pose estimation framework by combining different modules. We provide detailed documentation and API reference, as well as unittests.

What's New

2022-07-06: MMPose v0.28.0 is released. Major updates include:
- Support TCFormer (CVPR'2022). See the model page
- Add RLE pre-trained model on COCO dataset. See the model page
- Update Swin models with better performance
2022-02-28: MMPose model deployment is supported by MMDeploy v0.3.0 MMPose Webcam API is a simple yet powerful tool to develop interactive webcam applications with MMPose features.
2021-12-29: OpenMMLab Open Platform is online! Try our pose estimation demo

Installation

MMPose depends on PyTorch and MMCV. Below are quick steps for installation. Please refer to install.md for detailed installation guide.

conda create -n openmmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate openmmlab
pip3 install openmim
mim install mmcv-full
git clone https://github.com/open-mmlab/mmpose.git
cd mmpose
pip3 install -e .

Getting Started

Please see get_started.md for the basic usage of MMPose. There are also tutorials:

Model Zoo

Results and models are available in the README.md of each method's config directory. A summary can be found in the Model Zoo page.

Supported algorithms:

Supported techniques:

Supported datasets:

Supported backbones:

Model Request

We will keep up with the latest progress of the community, and support more popular algorithms and frameworks. If you have any feature requests, please feel free to leave a comment in MMPose Roadmap.

Benchmark

Accuracy and Training Speed

MMPose achieves superior of training speed and accuracy on the standard keypoint detection benchmarks like COCO. See more details at benchmark.md.

Inference Speed

We summarize the model complexity and inference speed of major models in MMPose, including FLOPs, parameter counts and inference speeds on both CPU and GPU devices with different batch sizes. Please refer to inference_speed_summary.md for more details.

Data Preparation

Please refer to data_preparation.md for a general knowledge of data preparation.

FAQ

Please refer to FAQ for frequently asked questions.

Contributing

We appreciate all contributions to improve MMPose. Please refer to CONTRIBUTING.md for the contributing guideline.

Acknowledgement

MMPose is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new models.

Citation

If you find this project useful in your research, please consider cite:

@misc{mmpose2020,
    title={OpenMMLab Pose Estimation Toolbox and Benchmark},
    author={MMPose Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmpose}},
    year={2020}
}

License

This project is released under the Apache 2.0 license.

Projects in OpenMMLab

MMCV: OpenMMLab foundational library for computer vision.
MIM: MIM installs OpenMMLab packages.
MMClassification: OpenMMLab image classification toolbox and benchmark.
MMDetection: OpenMMLab detection toolbox and benchmark.
MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
MMPose: OpenMMLab pose estimation toolbox and benchmark.
MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
MMSelfSup: OpenMMLab self-supervised learning toolbox and benchmark.
MMRazor: OpenMMLab model compression toolbox and benchmark.
MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
MMTracking: OpenMMLab video perception toolbox and benchmark.
MMFlow: OpenMMLab optical flow toolbox and benchmark.
MMEditing: OpenMMLab image and video editing toolbox.
MMGeneration: OpenMMLab image and video generative models toolbox.
MMDeploy: OpenMMLab Model Deployment Framework.

About

OpenMMLab Pose Estimation Toolbox and Benchmark.

https://mmpose.readthedocs.io/en/latest/

Apache License 2.0

Languages

Language:Python 96.8%Language:Jupyter Notebook 3.0%Language:Shell 0.1%Language:Dockerfile 0.0%