Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset

Project Page | Paper | Data (coming soon!)

This repository contains the implementation of the following paper:

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
Jing Lin^∗12, Ailing Zeng^∗1, Shunlin Lu^∗13, Yuanhao Cai², Ruimao Zhang³, Haoqian Wang², Lei Zhang¹
^∗ Equal contribution. ¹International Digital Economy Academy ² Tsinghua University ³The Chinese University of Hong Kong, Shenzhen

Figure 1. Motion samples from our dataset

Table of Contents

General Description
Dataset Download
Experiments
Citing

General Description

We propose a high-accuracy and efficient annotation pipeline for whole-body motions and the corresponding text labels. Based on it, we build a large-scale 3D expressive whole-body human motion dataset from massive online videos and eight existing motion datasets. We unify them into the same formats, providing whole-body motion (i.e., SMPL-X) and corresponding text labels.

Labels from Motion-X:

Motion label: including 13.7M whole-body poses and 96K motion clips annotation, represented as SMPL-X parameters.
Text label: (1) 13.7M frame-level whole-body pose description and (2) 96K sequence-level semantic labels.
Other modalities: RGB videos, audio, and music information.

Supported Tasks:

Text-driven 3d whole-body human motion generation
3D whole-body human mesh recovery
Others: Motion pretraining, multi-modality pre-trained models for motion understanding and generation, etc.

Figure 2. Example of the RGB video and annotated motion, RGB videos are from: website1, website2, website3

Dataset Download

We hope to disseminate Motion-X in a manner that aligns with the original data sources and complies with the necessary protocols. Here are the instructions:

Fill out this form to request authorization to use Motion-X for non-commercial purposes. After you submit the form, an email containing the dataset will be delivered to you as soon as we release the dataset. We plan to release Motion-X by Sept. 2023.
For the motion capture datasets (i.e., AMASS, GRAB, EgoBody),
- we will not distribute the original motion data. So Please download the originals from the original websites.
- We will provide the text labels and facial expressions annotated by our team.
For the other datasets (i.e., NTU-RGBD120, AIST++, HAA500, HuMMan),
- please read and acknowledge the licenses and terms of use on the original websites.
- Once users have obtained necessary approvals from the original institutions, we will provide the motion and text labels annotated by our team.

Dataset	Clip Number	Frame Number	Body Motion	Hand Motion	Facial Motion	Semantic Text	Pose Text	Website
AMASS	26K	3.5M	AMASS	AMASS	Ours	HumanML3D	Ours	amass
NTU-RGBD120	38K	2.6M	Ours	Ours	Ours	NTU	Ours	rose1
AIST++	1.4K	1.1M	Ours	Ours	Ours	AIST++	Ours	aist
HAA500	9.9K	0.6M	Ours	Ours	Ours	HAA500	Ours	cse.ust.hk
HuMMan	0.9K	0.2M	Ours	Ours	Ours	Ours	Ours	HuMMan
GRAB	1.3K	1.6M	GRAB	GRAB	Ours	GRAB	Ours	grab
EgoBody	1.0K	0.4M	EgoBody	EgoBody	Ours	Ours	Ours	sanweiliti
BAUM	1.4K	0.2M	Ours	Ours	Ours	BAUM	Ours	mimoza
Online Videos	15K	3.4M	Ours	Ours	Ours	Ours	Ours	online
Motion-X (Ours)	96K	13.7M	Ours	Ours	Ours	Ours	Ours	motion-x

To retrieve motion and text labels you can simply do:

import numpy as np
import torch

# read motion and save as smplx representation
motion = np.load('motion_data/000001.npy')
motion = torch.tensor(motion).float()
motion_parms = {
            'root_orient': motion[:, :3],  # controls the global root orientation
            'pose_body': motion[:, 3:3+63],  # controls the body
            'pose_hand': motion[:, 66:66+90],  # controls the finger articulation
            'pose_jaw': motion[:, 66+90:66+93],  # controls the yaw pose
            'face_expr': motion[:, 159:159+50],  # controls the face expression
            'face_shape': motion[:, 209:209+100],  # controls the face shape
            'trans': motion[:, 309:309+3],  # controls the global body position
            'betas': motion[:, 312:],  # controls the body shape. Body shape is static
        }

# read text labels
semantic_text = np.loadtxt('texts/semantic_texts/000001.npy')     # semantic labels 
body_text = np.loadtxt('texts/body_texts/000001.txt')     # body pose description
hand_text = np.loadtxt('texts/hand_texts/000001.txt')     # hand pose description
face_text = np.loadtxt('texts/face_texts/000001.txt')     # facial expression

Experiments

Validation of the motion annotation pipeline

Our annotation pipeline significantly surpasses existing SOTA 2D whole-body models and mesh recovery methods.

Benchmarking Text-driven Whole-body Human Motion Generation

Comparison with HumanML3D on Whole-body Human Motion Generation Task

Impact on 3D Whole-Body Human Mesh Recovery

Citing

If you find this repository useful for your work, please consider citing it as follows:

@article{lin2023motionx,
  title={Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset},
  author={Lin, Jing and Zeng, Ailing and Lu, Shunlin and Cai, Yuanhao and Zhang, Ruimao and Wang, Haoqian and Zhang, Lei},
  journal={arXiv preprint arXiv: 2307.00818},
  year={2023}
}

About

Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"

https://motion-x-dataset.github.io

Other