This is an implementation of Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots (Paper, Project Page)
The original paper used TED dataset, but, in this repository, we modified the code to use Trinity Speech-Gesture Dataset for GENEA Challenge 2020. The model is also changed to estimate rotation matrices for upper-body joints instead of estimating Cartesian coordinates.
The code was developed using python 3.6 on Ubuntu 18.04. Pytorch 1.3.1 was used, but the latest version would be okay.
-
Install dependencies
pip install -r requirements.txt
-
Download the FastText vectors from here and put
crawl-300d-2M-subword.bin
to the resource folder (PROJECT_ROOT/resource/crawl-300d-2M-subword.bin
). You may use the cache file instead of downloading the FastText vectors (> 5 GB). Put the cache file into the LMDB folder that will be created in the next step. The code automatically loads the cache file when it exists (seebuild_vocab
function). -
Make LMDB
cd scripts python trinity_data_to_lmdb.py [PATH_TO_TRINITY_DATASET]
-
Update paths and parameters in
PROJECT_ROOT/config/seq2seq.yml
and runtrain.py
python train.py --config=../config/seq2seq.yml
-
Inference
python inference.py [PATH_TO_MODEL] [PATH_TO_TRANSCRIPT]
We share the model trained on the training set of the GENEA challenge 2020. Click here to download
Please see LICENSE.md
@INPROCEEDINGS{
yoonICRA19,
title={Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots},
author={Yoon, Youngwoo and Ko, Woo-Ri and Jang, Minsu and Lee, Jaeyeon and Kim, Jaehong and Lee, Geehyuk},
booktitle={Proc. of The International Conference in Robotics and Automation (ICRA)},
year={2019}
}