zhongshijun / Motion-X

Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"

Home Page:https://motion-x-dataset.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset

Project Page | Paper | Data (coming soon!)

This repository contains the implementation of the following paper:

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
Jing Lin∗12, Ailing Zeng∗1, Shunlin Lu∗13, Yuanhao Cai2, Ruimao Zhang3, Haoqian Wang2, Lei Zhang1
Equal contribution. 1International Digital Economy Academy 2 Tsinghua University 3The Chinese University of Hong Kong, Shenzhen


Figure 1. Motion samples from our dataset

Table of Contents

  1. General Description
  2. Dataset Download
  3. Experiments
  4. Citing

General Description

We propose a high-accuracy and efficient annotation pipeline for whole-body motions and the corresponding text labels. Based on it, we build a large-scale 3D expressive whole-body human motion dataset from massive online videos and eight existing motion datasets. We unify them into the same formats, providing whole-body motion (i.e., SMPL-X) and corresponding text labels.

Labels from Motion-X:

  • Motion label: including 13.7M whole-body poses and 96K motion clips annotation, represented as SMPL-X parameters.
  • Text label: (1) 13.7M frame-level whole-body pose description and (2) 96K sequence-level semantic labels.
  • Other modalities: RGB videos, audio, and music information.

Supported Tasks:

  • Text-driven 3d whole-body human motion generation
  • 3D whole-body human mesh recovery
  • Others: Motion pretraining, multi-modality pre-trained models for motion understanding and generation, etc.



Figure 2. Example of the RGB video and annotated motion, RGB videos are from: website1, website2, website3

Dataset Download

We hope to disseminate Motion-X in a manner that aligns with the original data sources and complies with the necessary protocols. Here are the instructions:

  • Fill out this form to request authorization to use Motion-X for non-commercial purposes. After you submit the form, an email containing the dataset will be delivered to you as soon as we release the dataset. We plan to release Motion-X by Sept. 2023.

  • For the motion capture datasets (i.e., AMASS, GRAB, EgoBody),

    • we will not distribute the original motion data. So Please download the originals from the original websites.
    • We will provide the text labels and facial expressions annotated by our team.
  • For the other datasets (i.e., NTU-RGBD120, AIST++, HAA500, HuMMan),

    • please read and acknowledge the licenses and terms of use on the original websites.
    • Once users have obtained necessary approvals from the original institutions, we will provide the motion and text labels annotated by our team.
Dataset Clip
Number
Frame
Number
Body
Motion
Hand
Motion
Facial
Motion
Semantic
Text
Pose
Text
Website
AMASS 26K 3.5M AMASS AMASS Ours HumanML3D Ours amass
NTU-RGBD120 38K 2.6M Ours Ours Ours NTU Ours rose1
AIST++ 1.4K 1.1M Ours Ours Ours AIST++ Ours aist
HAA500 9.9K 0.6M Ours Ours Ours HAA500 Ours cse.ust.hk
HuMMan 0.9K 0.2M Ours Ours Ours Ours Ours HuMMan
GRAB 1.3K 1.6M GRAB GRAB Ours GRAB Ours grab
EgoBody 1.0K 0.4M EgoBody EgoBody Ours Ours Ours sanweiliti
BAUM 1.4K 0.2M Ours Ours Ours BAUM Ours mimoza
Online Videos 15K 3.4M Ours Ours Ours Ours Ours online
Motion-X (Ours) 96K 13.7M Ours Ours Ours Ours Ours motion-x
  • To retrieve motion and text labels you can simply do:

    import numpy as np
    import torch
    
    # read motion and save as smplx representation
    motion = np.load('motion_data/000001.npy')
    motion = torch.tensor(motion).float()
    motion_parms = {
                'root_orient': motion[:, :3],  # controls the global root orientation
                'pose_body': motion[:, 3:3+63],  # controls the body
                'pose_hand': motion[:, 66:66+90],  # controls the finger articulation
                'pose_jaw': motion[:, 66+90:66+93],  # controls the yaw pose
                'face_expr': motion[:, 159:159+50],  # controls the face expression
                'face_shape': motion[:, 209:209+100],  # controls the face shape
                'trans': motion[:, 309:309+3],  # controls the global body position
                'betas': motion[:, 312:],  # controls the body shape. Body shape is static
            }
    
    # read text labels
    semantic_text = np.loadtxt('texts/semantic_texts/000001.npy')     # semantic labels 
    body_text = np.loadtxt('texts/body_texts/000001.txt')     # body pose description
    hand_text = np.loadtxt('texts/hand_texts/000001.txt')     # hand pose description
    face_text = np.loadtxt('texts/face_texts/000001.txt')     # facial expression

Experiments

Validation of the motion annotation pipeline

Our annotation pipeline significantly surpasses existing SOTA 2D whole-body models and mesh recovery methods.


Benchmarking Text-driven Whole-body Human Motion Generation


Comparison with HumanML3D on Whole-body Human Motion Generation Task


Impact on 3D Whole-Body Human Mesh Recovery


Citing

If you find this repository useful for your work, please consider citing it as follows:

@article{lin2023motionx,
  title={Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset},
  author={Lin, Jing and Zeng, Ailing and Lu, Shunlin and Cai, Yuanhao and Zhang, Ruimao and Wang, Haoqian and Zhang, Lei},
  journal={arXiv preprint arXiv: 2307.00818},
  year={2023}
}

About

Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"

https://motion-x-dataset.github.io

License:Other