This framework estimates the human pose on an image. The parts of the human body used in this project are shown in the following image:
More information regarding the human pose model might be found here: [MPI-pose](https://pose.mpi-inf.mpg.de/)For the demo purposes I took images with myself) | The resulting human pose estimation drawn over the original image |
---|---|
![]() |
![]() |
The models used in this project are based on openpose project (Caffe) and PoseEstimation-CoreML (Tensorflow) The CoreML model files are not included to the repo. To create that files do the following:
- Install Python and CoreML tools (Python 3.7.5, coremltools 3.1)
- Run CoreMLModels/download.sh
- Make changes in the file multiPoseModel/mpi/pose_deploy_linevec_faster_4_stages_fixed_size.prototxt:
input_dim: 1 # This value will be defined at runtime -> input_dim: 512
input_dim: 1 # This value will be defined at runtime -> input_dim: 512
- Run CoreMLModels/convert.sh. Upon successful execution the following CoreML files will be created: PoseMNV2_Single_14.mlmodel, PoseCNN_Multi_15.mlmodel. The model PoseMNV2_Single_14 is used to fast inferring of a single person on the image. The PoseCNN_Multi_15 model is used to do more sophisticated inferring of all presented human bodies on the image with significantly slower performance.
The above mentioned .prototxt contains hardcoded values to have a fixed size of an input image: input_dim: XXX - corresponds to the with of the NN input. input_dim: XXX - corresponds to the height of the NN input. When changing thes evalues do not forget to change the model configuration ModelConfigurationCNNMulti15.inputSize to a specified input value and use this configuration instead of an existing one in the framework which sets 512x512 as an input size.
Any values will work but the best results could be achieved if an aspect ratio matches the one that an original image has. Also, it should be taken into account that bigger values will affect the performance significantly which is shown in the Performance.
To run the demo the Cocoapods dependencies should be installed first. Run the following command in the Terminal app:
> cd <project-root-location>/pose
> pod install
Once the dependencies are installed open the pose.xcworkspace file in the Xcode. Select the poseDemo target and press build and Run button.
The output of the MPI15 model is a group of matrices whith dimensions (input_image_width / 8, input_image_height / 8)
. Each element in the matrix has float type. Mapping between matrix index in the output and the body part:
POSE_MPI_BODY_PARTS {
{0, "Head"},
{1, "Neck"},
{2, "RShoulder"},
{3, "RElbow"},
{4, "RWrist"},
{5, "LShoulder"},
{6, "LElbow"},
{7, "LWrist"},
{8, "RHip"},
{9, "RKnee"},
{10, "RAnkle"},
{11, "LHip"},
{12, "LKnee"},
{13, "LAnkle"},
{14, "Chest"},
{15, "Background"}
};
There are two types of output matrices in the PoseCNN_Multi_15 model. The ones that represent heatmaps and the others that represent PAFs. Each heat matrix corresponds to one joint part which is 15 in total. The PAF matrices represent body connections. For each body connection, there is X and Y matrix which is 28 in total (14 + 14). The total amount of matrices including the one that represents a background is 44. The output of the single person model PoseMNV2_Single_14 contains heatmaps and does not contain neither PAF's matrices nor a background layer.
The repository also contains a demo project 'poseDemo' that demonstrates usage of the framework.
NN input size | iPhone XR (ms) | iPhone 8 (ms) | iPhone 5S (ms) |
---|---|---|---|
CoreML | |||
512 x 512 | 190 | 3670 | 20801 |
256 x 256 | 70 | 1039 | 7162 |
Post-processing | |||
512 x 512 | 19 | 67 | 100 |
256 x 256 | 5 | 35 | |
Total | |||
512 x 512 | 219 | 3737 | 20901 |
256 x 256 | 75 | 1074 | 7200 |
All numbers shown above could vary for each particular run.
The resulting pose depending on the NN input size (the smaller and faster the less accurate result is)
512 x 512 | 256 x 256 |
---|---|
![]() |
![]() |
- Detecting if people at home and check if all the equipment is switched off (iron/owen).
- Locating people inside the living area and do automation (turn on lights/music/tv)
- NMS optimization. A parallel GPU implementation using METAL API.
- Use a different approximation for joints connection that is closer to real-life skeleton bones. Bones are not straight.
- Implement more robust filtering for the output pose to get rid of artifacts.
- Implement a pose estimation on a video stream
- http://posefs1.perception.cs.cmu.edu/Users/ZheCao/Multi-person%20pose%20estimation-CMU.pdf
- https://www.ri.cmu.edu/wp-content/uploads/2017/04/thesis.pdf
- https://pose.mpi-inf.mpg.de/
The image was taken from Magic Poser ![]() |
![]() |
![]() |