Monocular, One-stage, Regression of Multiple 3D People

ROMP is a one-stage network for multi-person 3D mesh recovery from a single image.

Monocular, One-stage, Regression of Multiple 3D People,
Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei,
arXiv paper (arXiv 2008.12272)

Contact: yusun@stu.hit.edu.cn. Feel free to contact me for related questions or discussions!

Simple: Simultaneously predicting the body center locations and corresponding 3D body mesh parameters for all people at each pixel.
Fast: ROMP ResNet-50 model runs over 30 FPS on a 1070Ti GPU.
Strong: ROMP achieves superior performance on multiple challenging multi-person/occlusion benchmarks, including 3DPW, CMU Panoptic, and 3DOH50K.
Easy to use: We provide user friendly testing API and webcam demos.

News

2021/7/15: Adding support for an elegant context manager to run code in a notebook. See Colab demo for the details.
2021/4/19: Adding support for textured SMPL mesh using vedo. See visualization.md for the details.
2021/3/30: 1.0 version. Rebuilding the code. Release the ResNet-50 version and evaluation on 3DPW.
2020/11/26: Optimization for person-person occlusion. Small changes for video support.
2020/9/11: Real-time webcam demo using local/remote server. Please refer to config_guide.md for details.
2020/9/4: Google Colab demo. Saving a npy file per imag. Please refer to config_guide.md for details.

Try on Google Colab

Before installation, you can take a few minutes to try the prepared Google Colab demo a try.
It allows you to run the project in the cloud, free of charge.

Please refer to the bug.md for unpleasant bugs. Welcome to submit the issues for related bugs.

Installation

Please refer to install.md for installation.

Demo

Currently, the released code is used to re-implement demo results. Only 1-2G GPU memory is needed.

To do this you just need to run

cd ROMP/src
sh run.sh
# if there are any bugs about shell script, please consider run the following command instead:
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/single_image.yml

Results will be saved in ROMP/demo/images_results.

Internet images

You can also run the code on random internet images via putting the images under ROMP/demo/images.

Please refer to config_guide.md for saving the estimated mesh/Center maps/parameters dict.

Internet videos

You can also run the code on random internet videos.

To do this you just need to firstly change the input_video_path in src/configs/video.yml to /path/to/your/video. For example, set

 video_or_frame: True
 input_video_path: '../demo/videos/sample_video.mp4' # None
 output_dir: '../demo/videos/sample_video_results/'

then run

cd ROMP/src
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/video.yml

Results will be saved to ../demo/videos/sample_video_results.

Batch Videos

You can also batch process a directory of videos. Please refer to batch_videos.md for more info.

Unix

python lib/utils/batch_videos.py --input=/home/user/Animations/mocap/cleaned --output=/home/user/Animations/mocap/cleaned/processed --extension mp4 --run_conversion --yaml_template=configs/video-batch.yml

Windows

python lib/utils/batch_videos.py --input=M:/Animations/mocap/cleaned --output=M:/Animations/mocap/cleaned/processed --extension mp4 --windows --run_conversion --yaml_template=configs/video-batch.yml

Webcam

We also provide the webcam demo code, which can run at real-time on a 1070Ti GPU / remote server.
Currently, limited by the visualization pipeline, the webcam visualization code only support the single-person mesh.

To do this you just need to run:

cd ROMP/src
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/webcam.yml
# or try to use the model with ResNet-50 as backbone.
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/webcam_resnet.yml

Press Up/Down to end the demo. Pelease refer to config_guide.md for running webcam demo on remote server, setting mesh color or camera id.

Blender

Export to Blender FBX

Please refer to expert.md to export the results to fbx files for Blender usage. Currently, this function only support the single-person video cases. Therefore, please test it with ../demo/videos/sample_video2_results/sample_video2.mp4, whose results would be saved to ../demo/videos/sample_video2_results.

Blender Addons

vltmedia/QuickMocap-BlenderAddon: Use this Blender Addon to import & clean Mocap Pose data from .npz or .pkl files. These files may have been created using Numpy, ROMP, or other motion capture processes that package their files accordingly. (github.com)
- Reads the .npz file created by ROMP. Clean & smooth the resulting keyframes.

@InProceedings{ROMP,
author = {Sun, Yu and Bao, Qian and Liu, Wu and Fu, Yili and Michael J., Black and Mei, Tao},
title = {Monocular, One-stage, Regression of Multiple 3D People},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021}
}

Acknowledgement

We thank Peng Cheng for his constructive comments on Center map training.

Thanks to Marco Musy for his help in the textured SMPL visualization.

Thanks to Gavin Gray for adding support for an elegant context manager to run code in a notebook via this pull.

Thanks to VLT Media for adding support for running on Windows & batch_videos.py.

Here are some great resources we benefit:

SMPL models and layer is borrowed from MPII SMPL-X model.
Webcam pipeline is borrowed from minimal-hand.
Some functions are borrowed from HMR-pytorch.
Some functions for data augmentation are borrowed from SPIN.
Synthetic occlusion is borrowed from synthetic-occlusion.
The evaluation code of 3DPW dataset is brought from 3dpw-eval.
For fair comparison, the GT annotations of 3DPW dataset are brought from VIBE.
3D mesh visualization is supported by vedo and Open3D.

neoglez / ROMP