Human Pose Estimation

Overview

This application performs multi-person 2D human pose estimation and person detection on video streams using a human pose estimation model and a person detection model that are deployed with the OpenVINO toolkit. The pose estimation model identifies up to 18 points: ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, and ankles. The results of the pose estimation and person detection predictions are rendered in the application in real-time on a camera feed or pre-recorded video. In addition to the pose estimation, the inference time, number of people in the video, and the order of inference for each person are overlaid on the video.

People entering and exiting the video feed are saved in a database as a series of events that can be accessed via a JSON API. The application can be run on a CPU, GPU, or the Intel Neural Compute Stick 2 via the command line or a Flask-based web application. The web-based interface allows end users to toggle between a real-time camera feed and a pre-recorded demo video.

Human Pose Estimation Model

The model is a multi-person 2D pose estimation network that uses a tuned MobileNet V1 as the feature extractor. The model is based on the OpenPose approach and was originally built using the Caffe framework. OpenPose is the first real-time model to jointly detect multiple body parts. The model has two outputs, PAF and a keypoint heatmap. The coordinates in the heatmap are scaled and used for the pose overlay frame by frame.

Person Detection Model

The model is developed for pedestrian detection in retail scenarios. It is based on a MobileNetV2-like backbone that includes depth-wise convolutions (single convolution filter per input channel) to reduce computation for the 3x3 convolution block. It was also originally built using the Caffe framework. The model outputs the confidence for the prediction and bounding box coordinates. The application counts any result above a .4 confidence value as a person.

Setup

Python Dependencies

pip install -e .

OpenVINO Installation

Instructions for installing OpenVINO on Linux, Windows, macOS, and Raspian OS

https://docs.openvino.ai/latest/openvino_docs_install_guides_installing_openvino_linux.html

Create SQLite Database

python create_db.py

Run Demo in Window

Usage: demo.py [OPTIONS]

Options:
  --device-name TEXT  device to run the network on, CPU, GPU, or MYRIAD.
                      Default is CPU
  --precision TEXT    model precision FP16-INT8, FP16, or FP32. Default is
                      FP16-INT8
  --video-url TEXT    use video instead of video camera, try
                      https://github.com/intel-iot-devkit/sample-
                      videos/blob/master/store-aisle-detection.mp4?raw=true.
                      Default uses video camera
  --help              Show this message and exit.

Run with camera feed

python demo.py

Run with sample video

python demo.py --video-url="https://github.com/intel-iot-devkit/sample-videos/blob/master/store-aisle-detection.mp4?raw=true"

Run Flask Web Application

python main.py

http://127.0.0.1:5000/?device-name=CPU (or GPU, MYRIAD, or AUTO (default) for Intel Neural Compute Stick 2)

Detection event API

http://127.0.0.1:5000/detections?page=1&per-page=50

References

https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/402-pose-estimation-webcam

https://docs.openvino.ai/latest/index.html

https://blog.tensorflow.org/2018/05/real-time-human-pose-estimation-in.html

dcmartin / pose-estimation

Human Pose Estimation

Overview