Human Pose estimation

This framework estimates the human pose on an image. The parts of the human body used in this project are shown in the following image:

More information regarding the human pose model might be found here: [MPI-pose](https://pose.mpi-inf.mpg.de/)

For the demo purposes I took images with myself)	The resulting human pose estimation drawn over the original image

Preparing the model

The models used in this project are based on openpose project (Caffe) and PoseEstimation-CoreML (Tensorflow) The CoreML model files are not included to the repo. To create that files do the following:

Install Python and CoreML tools (Python 3.7.5, coremltools 3.1)
Run CoreMLModels/download.sh
Make changes in the file multiPoseModel/mpi/pose_deploy_linevec_faster_4_stages_fixed_size.prototxt:

input_dim: 1 # This value will be defined at runtime ->  input_dim: 512
input_dim: 1 # This value will be defined at runtime ->  input_dim: 512

Run CoreMLModels/convert.sh. Upon successful execution the following CoreML files will be created: PoseMNV2_Single_14.mlmodel, PoseCNN_Multi_15.mlmodel. The model PoseMNV2_Single_14 is used to fast inferring of a single person on the image. The PoseCNN_Multi_15 model is used to do more sophisticated inferring of all presented human bodies on the image with significantly slower performance.

The above mentioned .prototxt contains hardcoded values to have a fixed size of an input image: input_dim: XXX - corresponds to the with of the NN input. input_dim: XXX - corresponds to the height of the NN input. When changing thes evalues do not forget to change the model configuration ModelConfigurationCNNMulti15.inputSize to a specified input value and use this configuration instead of an existing one in the framework which sets 512x512 as an input size.

Any values will work but the best results could be achieved if an aspect ratio matches the one that an original image has. Also, it should be taken into account that bigger values will affect the performance significantly which is shown in the Performance.

Run the demo app in Xcode

To run the demo the Cocoapods dependencies should be installed first. Run the following command in the Terminal app:

> cd <project-root-location>/pose
> pod install

Once the dependencies are installed open the pose.xcworkspace file in the Xcode. Select the poseDemo target and press build and Run button.

Neural network output details

The output of the MPI15 model is a group of matrices whith dimensions (input_image_width / 8, input_image_height / 8). Each element in the matrix has float type. Mapping between matrix index in the output and the body part:

POSE_MPI_BODY_PARTS {
{0,  "Head"},
{1,  "Neck"},
{2,  "RShoulder"},
{3,  "RElbow"},
{4,  "RWrist"},
{5,  "LShoulder"},
{6,  "LElbow"},
{7,  "LWrist"},
{8,  "RHip"},
{9,  "RKnee"},
{10, "RAnkle"},
{11, "LHip"},
{12, "LKnee"},
{13, "LAnkle"},
{14, "Chest"},
{15, "Background"}
};

Heatmaps and PAFs

There are two types of output matrices in the PoseCNN_Multi_15 model. The ones that represent heatmaps and the others that represent PAFs. Each heat matrix corresponds to one joint part which is 15 in total. The PAF matrices represent body connections. For each body connection, there is X and Y matrix which is 28 in total (14 + 14). The total amount of matrices including the one that represents a background is 44. The output of the single person model PoseMNV2_Single_14 contains heatmaps and does not contain neither PAF's matrices nor a background layer.

Demo project

The repository also contains a demo project 'poseDemo' that demonstrates usage of the framework.

Sample	Images
Human pose result:	Heatmaps combined into one image. Each joint has its own color:

PAFs combined into one image:	All heatmap candidates. Each candidate has its own confidence which defines its opacity on the image:

Closer look at heatmap candidates corresponding a head:	Closer look at heatmap candidates corresponding to a neck:

PAF matrix which corresponds to a head neck connection candidate. The head, neck heatmap joints are shown also on the image:	PAF matrix which corresponds to a LShoulder, LElbow connection candidate. The LShoulder-LElbow heatmap joints are shown also on the image:

Performance

Time to process one frame (1-2 persons in the view)

NN input size	iPhone XR (ms)	iPhone 8 (ms)	iPhone 5S (ms)
CoreML
512 x 512	190	3670	20801
256 x 256	70	1039	7162
Post-processing
512 x 512	19	67	100
256 x 256	5	35
Total
512 x 512	219	3737	20901
256 x 256	75	1074	7200

All numbers shown above could vary for each particular run.

The resulting pose depending on the NN input size (the smaller and faster the less accurate result is)

512 x 512	256 x 256

Applications

Healthcare

Detecting anomalies in the human spine on still images:
Health and fitness guide.

Home security and automation (not related to mobile phones)

Detecting if people at home and check if all the equipment is switched off (iron/owen).
Locating people inside the living area and do automation (turn on lights/music/tv)

Improvements

NMS optimization. A parallel GPU implementation using METAL API.
Use a different approximation for joints connection that is closer to real-life skeleton bones. Bones are not straight.
Implement more robust filtering for the output pose to get rid of artifacts.
Implement a pose estimation on a video stream

In-Depth information

Some fun


The image was taken from Magic Poser

About

Estimates a human pose on an image.

Apache License 2.0

Languages

Language:Swift 91.8%Language:Python 3.8%Language:Rich Text Format 2.1%Language:Shell 1.0%Language:Ruby 0.8%Language:Objective-C 0.5%