Simplify RRIS's code A.I. rock paper scissors machine using MediaPipe and KNN.
python dual.py
Google MediaPipe for Pose Estimation
MediaPipe is a cross-platform framework for building multimodal applied machine learning pipelines including inference models and media processing functions.
The main purpose of this repo is to:
- Customize output of MediaPipe solutions
- Customize visualization of 2D & 3D outputs
- Demo some simple applications on Python (refer to Demo Overview)
- Demo some simple applications on JavaScript refer to java folder
Attractiveness of Google MediaPipe as compared to other SOTA (e.g. FrankMocap, CMU OpenPose, DeepPoseKit, DeepLabCut, MinimalHand):
- Fast: Runs at almost realtime rate on CPU and even mobile devices
- Open-source: Codes are freely available at github (except that details of network models are not released)
- User-friendly: For python API just
pip install mediapipe
will work (but C++ API is much more troublesome to build and use) - Cross-platform: Works across Android, iOS, desktop, JavaScript and web (Note: this repo only focuses on using Python API for desktop usage)
- ML Solutions: Apart from face, hand, body and object pose estimations, MediaPipe offers an array of machine learning applications refer to their github for more details
Latest MediaPipe Python API version 0.8.4.2 (Released 11 May 2021) features:
Face Mesh (468 3D face landmarks)
- Blog | Code | Paper | Video | Model Card
Hands (21 3D landmarks and able to support multiple hands)
- Blog | Code | Paper | Video | Model Card
Body Pose (33 3D landmarks for whole body, 3 levels of model complexity (NEW))
- Blog | Code | Paper | Video | Model Card
Holistic (Face + Hands + Body) (A total of 543/535 landmarks: 468 face + 2 x 21 hands + 33/25 pose)
Objectron (3D object detection and tracking) (4 possible objects: Shoe / Chair / Camera / Cup)
- Blog | Code | Paper | Paper | Model Card
Note: The above videos are presented at CVPR 2020 Fourth Workshop on Computer Vision for AR/VR, interested reader can refer to the link for other related works.
The simplest way to run our implementation is to use anaconda.
You can create an anaconda environment called mp
with
conda env create -f environment.yaml
conda activate mp
Single Image | Video Input | Gesture Recognition | Rock Paper Scissor Game |
---|---|---|---|
Measure Hand ROM | Measure Wrist and Forearm ROM | Face Mask | Triangulate Points for 3D Pose |
---|---|---|---|
3D Skeleton | 3D Object Detection |
---|---|
4 different modes are available and sample images are located in data/sample/ folder
python 00_image.py --mode face
python 00_image.py --mode hand
python 00_image.py --mode body
python 00_image.py --mode holistic
Note: The sample images for subject with body marker are adapted from An Asian-centric human movement database capturing activities of daily living and the image of Mona Lisa is adapted from Wiki
4 different modes are available and video capture can be done online through webcam or offline from your own .mp4 file
python 01_video.py --mode face
python 01_video.py --mode hand
python 01_video.py --mode body
python 01_video.py --mode holistic
Note: It takes around 10 to 30 FPS on CPU, depending on the mode selected. The video demonstrating supported mini-squats is adapted from National Stroke Association
2 modes are available: Use evaluation mode to perform recognition of 11 gestures and use train mode to log your own training data
python 02_gesture.py --mode eval
python 02_gesture.py --mode train
Note: A simple but effective K-nearest neighbor (KNN) algorithm is used as the classifier. For the hand gesture recognition demo, since 3D hand joints are available, we can compute flexion joint angles (feature vector) and use it to classify different hand poses. On the other hand, if 3D body joints are not yet reliable, the normalized pairwise distances between predifined lists of joints as described in MediaPipe Pose Classification could also be used as the feature vector for KNN.
Simple game of rock paper scissor requires a pair of hands facing the camera
python 03_game_rps.py
For another game of flappy bird refer to this github
2 modes are available: Use evaluation mode to perform hand ROM recognition and use train mode to log your own training data
python 04_hand_rom.py --mode eval
python 04_hand_rom.py --mode train
3 modes are available and user has to input the side of the hand to be measured
- 0: Wrist flexion/extension
- 1: Wrist radial/ulnar deviation
- 2: Forearm pronation/supination
python 05_wrist_rom.py --mode 0 --side right
python 05_wrist_rom.py --mode 1 --side right
python 05_wrist_rom.py --mode 2 --side right
python 05_wrist_rom.py --mode 0 --side left
python 05_wrist_rom.py --mode 1 --side left
python 05_wrist_rom.py --mode 2 --side left
Note: For measuring forearm pronation/supination, the camera has to be placed at the same level as the hand such that palmar side of the hand is directly facing camera. For measuring wrist ROM, the camera has to be placed such that upper body of the subject is visible, refer to examples of wrist_XXX.png images in data/sample/ folder. The wrist images are adapted from Goni Wrist Flexion, Extension, Radial & Ulnar Deviation
Overlay a 3D face mask on the detected face in image plane
python 06_face_mask.py
Note: The face image is adapted from MediaPipe 3D Face Transform
Estimating 3D body pose from a single 2D image is an ill-posed problem and extremely challenging. One way to reconstruct 3D body pose is to make use of multiview setup and perform triangulation. For offline testing, use CMU Panoptic Dataset, follow the instructions on PanopticStudio Toolbox to download a sample dataset 171204_pose1_sample into data/ folder
python 07_triangulate.py --mode body --use_panoptic_dataset
3D pose estimation is available in full-body mode and this demo displays the estimated 3D skeleton of the hand and/or body. 3 different modes are available and video capture can be done online through webcam or offline from your own .mp4 file
python 08_skeleton_3D.py --mode hand
python 08_skeleton_3D.py --mode body
python 08_skeleton_3D.py --mode holistic
4 different modes are available and a sample image is located in data/sample/ folder. Currently supports 4 classes: Shoe / Chair / Cup / Camera.
python 09_objectron.py --mode shoe
python 09_objectron.py --mode chair
python 09_objectron.py --mode cup
python 09_objectron.py --mode camera
Estimating 3D pose from a single 2D image is an ill-posed problem and extremely challenging, thus the measurement of ROM may not be accurate! Please refer to the respective model cards for more details on other types of limitations such as lighting, motion blur, occlusions, image resolution, etc.