APeiZou/ROMP

Monocular, One-stage, Regression of Multiple 3D People

ROMP is a concise one-stage network for multi-person 3D mesh recovery from a single image.

Simple. Concise one-stage framework for simultaneous person detection and 3D body mesh recovery.
Fast. ROMP can run over 30 FPS on a 1070Ti GPU.
Strong. ROMP achieves superior performance on multiple challenging multi-person/occlusion benchmarks.
Easy to use. We provide user friendly testing API and webcam demos.

Contact: yusun@stu.hit.edu.cn. Feel free to contact me for related questions or discussions!

Features
News
Getting Started
Citation
Contributor
Acknowledgement

Monocular, One-stage, Regression of Multiple 3D People,
Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei,
arXiv paper (arXiv 2008.12272)

Features

Running the examples on Google Colab.
Real-time online webcam demo for driving textured SMPL model with single-person motion. We also provide a wardrobe for changing clothes.
Batch processing images/videos via command line / jupyter notebook / calling ROMP as a python lib.
Exporting the captured single-person motion to FBX file for Blender/Unity usage.
Training and evaluation for re-implementing our results presented in paper.
Convenient API for 2D / 3D visualization, parsed datasets.

News

2021/9/13: Low FPS / args parsing bugs are fixed. Support calling as a python lib.
2021/9/10: Training code release. API optimization.
Old logs

Getting started

Try on Google Colab

It allows you to run the project in the cloud, free of charge.
Let's give the prepared Google Colab demo a try.

Installation

Please refer to install.md for installation.

Inference

Currently, we support processing images, video or real-time webcam.
Pelease refer to config_guide.md for configurations.

ROMP can be called as a python lib inside the python code, jupyter notebook, or from command line / scripts, please refer to Google Colab demo for examples.

Processing images

To re-implement the demo results, please run

cd ROMP
# change the `inputs` in configs/image.yml to /path/to/your/image folder, then run 
sh scripts/image.sh
# or run the command like
python -m romp.predict.image --inputs=demo/images --output_dir=demo/image_results

Please refer to config_guide.md for saving the estimated mesh/Center maps/parameters dict.

Here, we show an example of calling ROMP as a python lib.

# set the absolute path to ROMP
path_to_romp = '/path/to/ROMP'
import os,sys
sys.path.append(path_to_romp)
# set the detailed configurations
from romp.lib.config import ConfigContext, parse_args, args
ConfigContext.parsed_args = parse_args(["--configs_yml=configs/image.yml",'--inputs=/path/to/images_folder', '--output_dir=/path/to/save/image_results', '--save_centermap', False]) # Be caution that setting the bool configs needs two elements, ['--config', True/False]
# import the ROMP image processor
from romp.predict.image import Image_processor
processor = Image_processor(args_set=args())
results_dict = processor.run(args().inputs) # you can change the args().inputs to other /path/to/images_folder

Processing videos

cd ROMP
# change the `inputs` in configs/video.yml to /path/to/your/video file or a folder containing video frames, then run 
sh scripts/video.sh
# or run the command like
python -u -m romp.predict.video --inputs=demo/videos/sample_video.mp4 --output_dir=demo/sample_video_results

Here, we show an example of calling ROMP as a python lib.

# set the absolute path to ROMP
path_to_romp = '/path/to/ROMP'
import os,sys
sys.path.append(path_to_romp)
# set the detailed configurations
from romp.lib.config import ConfigContext, parse_args, args
ConfigContext.parsed_args = parse_args(["--configs_yml=configs/video.yml",'--inputs=/path/to/video', '--output_dir=/path/to/save/video_results', '--save_visualization_on_img',False]) # Be caution that setting the bool configs needs two elements, ['--config', True/False]
# import the ROMP image processor
from romp.predict.video import Video_processor
processor = Video_processor(args_set=args())
results_dict = processor.run(args().inputs) # you can change the args().inputs to other /path/to/video

Webcam

To do this you just need to run:

cd ROMP
sh scripts/webcam.sh

Currently, limited by the visualization pipeline, the real-time webcam demo only visualize the results of the largest person in the frames.

Export

Export to Blender FBX

Please refer to expert.md to export the results to fbx files for Blender usage. Currently, this function only support the single-person video cases. Therefore, please test it with demo/videos/sample_video2_results/sample_video2.mp4, whose results would be saved to demo/videos/sample_video2_results.

Blender Addons

VLT Media creates a QuickMocap-BlenderAddon to read the .npz file created by ROMP. Clean & smooth the resulting keyframes.

Train

Please prepare the training datasets following dataset.md, and then refer to train.md for training.

Evaluation

Please refer to evaluation.md for evaluation on benchmarks.

Bugs report

Please refer to bug.md for solutions. Welcome to submit the issues for related bugs. I will solve them as soon as possible.

Citation

Please considering citing

@InProceedings{ROMP,
author = {Sun, Yu and Bao, Qian and Liu, Wu and Fu, Yili and Michael J., Black and Mei, Tao},
title = {Monocular, One-stage, Regression of Multiple 3D People},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021}
}

Contributor

This repository is currently maintained by Yu Sun.

ROMP has also benefited from many developers, including

Marco Musy : help in the textured SMPL visualization.
Gavin Gray : adding support for an elegant context manager to run code in a notebook.
VLT Media : adding support for running on Windows & batch_videos.py.

Acknowledgement

We thank Peng Cheng for his constructive comments on Center map training.

Here are some great resources we benefit:

SMPL models and layer is borrowed from MPII SMPL-X model.
Some functions are borrowed from HMR-pytorch and SPIN.
The evaluation code and GT annotations of 3DPW dataset is brought from 3dpw-eval and VIBE.
3D mesh visualization is supported by vedo, EasyMocap, minimal-hand and Open3D.

Please consider citing their papers.

APeiZou / ROMP