Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation
Yifeng Zhu, Peter Stone, Yuke Zhu
Introduction
We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations and use these skills to synthesize prolonged robot behaviors. Our method starts with constructing a hierarchical task structure from each demonstration through agglomerative clustering. From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning. Finally, we train a meta controller to compose these skills to solve long-horizon manipulation tasks. The entire model can be trained on a small set of human demonstrations collected within 30 minutes without further annotations, making it amendable to real-world deployment. We systematically evaluated our method in simulation environments and on a real robot. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks. Furthermore, skills discovered from multi-task demonstrations boost the average task success by 8% compared to those discovered from individual tasks.
Dependencies
- robosuite
- pytorch
- sklearn
Simulation Environments
We implemented four simulated environments for evaluation, including three single-task ones and a multi-task one. They are Hammer-Place, Tool-Use, Kitchen, Multitask-Kitchen. The implementations of these environments can be found in robosuite-task-zoo.
Demonstration Data
We provide the demonstration data we used for our experiments. Please download the datasets here, and extract them in the root folder of this repo, and name the folder datasets
.
Instructions on running scripts
Demonstration collection
We collect demonstrations using 3D spacemouse. The following commands
show how we can collect data in a single-task environment or a
multi-task environment. For multi-task environment, you need to
specify another variable, task-id
, which is mapped to indiividual
task variant.
- Single-task
python data_collection/collect_demonstration_script.py --num-demonstration 100 --pos-sensitivity 1.0 --controller OSC_POSITION --environment ToolUseEnv
- Multi-task
python multitask/collect_demonstration_script.py --num-demonstration 40 --pos-sensitivity 1.0 --controller OSC_POSITION --environment MultitaskKitchenDomain --task-id 1
Create dataset
After demonstrations are collected, we want to create a dataset in hdf5 file format for skill discovery and policy learning.
python multitask/create_demonstration_dataset.py --folder demonstration_data/MultitaskKitchenDomain_training_set_0 --dataset-name training_set_0 --use-actions --use-camera-obs
- Important keys
- states
- actions
- task-id for every ep
Training multi-sensory representation
python multisensory_repr/train_multimodal.py
Skill segmentations
Hierarchical Agglomerative Clustering
- Single Task
python skill_discovery/hierarchical_agglomoration.py data=kitchen
- Multi-task
python multitask/hierarchical_agglomoration.py data=multitask_kitchen_domain
Spectral Clustering for K partitions
- Single Task
python skill_disocvery/agglomoration_script.py data=kitchen agglomoration.visualization=true repr.modalities="[agentview, eye_in_hand, proprio]" agglomoration.min_len_thresh=30 agglomoration.K=8 agglomoration.segment_scale=2 agglomoration.scale=0.01
- Multi-task
python skill_discovery/agglomoration_script.py data=kitchen agglomoration.visualization=true repr.modalities="[agentview, eye_in_hand, proprio]" agglomoration.min_len_thresh=30 agglomoration.K=8 agglomoration.segment_scale=2 agglomoration.scale=0.01
Policy Learning
Skill policies
- Single-task
python policy_learning/train_subskills.py skill_training.agglomoration.K=$2 skill_training.run_idx=2 data=kitchen skill_training.batch_size=128 skill_subgoal_cfg.visual_feature_dimension=32 skill_training.subtask_id="[$3]" skill_training.data_modality="[image, proprio]" skill_training.lr=1e-4 skill_training.num_epochs=2001 repr.z_dim=32
- Multi-task
python multitask/train_subskills.py skill_training.agglomoration.K=$2 skill_training.run_idx=0 data=multitask_kitchen_domain skill_training.batch_size=128 skill_subgoal_cfg.visual_feature_dimension=32 skill_training.subtask_id="[$3]" skill_training.data_modality="[image, proprio]" skill_training.lr=1e-4 skill_training.num_epochs=2001 repr.z_dim=32 skill_training.policy_layer_dims="[300, 300, 400]" skill_subgoal_cfg.horizon=20
Meta controllers
- Single Task
python policy_learning/train_meta_controller.py skill_training.agglomoration.K=5 skill_training.run_idx=2 data=hammer_place skill_training.batch_size=128 skill_subgoal_cfg.visual_feature_dimension=32 skill_training.data_modality="[image, proprio]" skill_training.lr=1e-4 skill_training.num_epochs=2001 meta_cvae_cfg.latent_dim=64 meta.use_eye_in_hand=False meta.random_affine=True meta_cvae_cfg.kl_coeff=0.005 repr.z_dim=32
- Multi-task
python multitask/train_meta_controller.py skill_training.agglomoration.K=$2 skill_training.run_idx=0 data=multitask_kitchen_domain skill_training.batch_size=128 skill_subgoal_cfg.visual_feature_dimension=32 skill_training.subtask_id="[-1]" skill_training.data_modality="[image, proprio]" skill_training.lr=1e-4 skill_training.num_epochs=4001 repr.z_dim=32 meta_cvae_cfg.latent_dim=64 meta.use_eye_in_hand=False meta.random_affine=True meta_cvae_cfg.kl_coeff=$4 meta.num_epochs=2001 multitask.task_id=$3 skill_training.policy_layer_dims="[300, 300, 400]" skill_subgoal_cfg.horizon=20
Evaluation
- ToolUse
python eval_scripts/eval_task.py skill_training.data_modality="[image, proprio]" data=tool_use skill_training.agglomoration.K=4 meta_cvae_cfg.latent_dim=64 repr.z_dim=32 skill_training.run_idx=0 eval.meta_freq=5 eval.max_steps=1500 meta_cvae_cfg.kl_coeff=0.005 meta.use_spatial_softmax=false meta.random_affine=true eval.testing=true eval.mode="BUDS"
- HammerPlace
python eval_scripts/eval_task.py skill_training.data_modality="[image, proprio]" data=hammer_place skill_training.agglomoration.K=4 meta_cvae_cfg.latent_dim=64 repr.z_dim=32 skill_training.run_idx=0 eval.meta_freq=5 eval.max_steps=1500 meta_cvae_cfg.kl_coeff=0.005 meta.use_spatial_softmax=false meta.random_affine=true eval.testing=true eval.mode="BUDS"
- Kitchen
python eval_scripts/eval_task.py skill_training.data_modality="[image, proprio]" data=kitchen skill_training.agglomoration.K=6 meta_cvae_cfg.latent_dim=64 repr.z_dim=32 skill_training.run_idx=0 eval.meta_freq=5 eval.max_steps=5000 meta_cvae_cfg.kl_coeff=0.01 meta.use_spatial_softmax=false meta.random_affine=true eval.testing=true eval.mode="BUDS"
- Multitask Kitchen
python multitask/eval_task.py skill_training.data_modality="[image, proprio]" data=multitask_kitchen_domain skill_training.agglomoration.K=8 meta_cvae_cfg.latent_dim=64 repr.z_dim=32 skill_training.run_idx=0 eval.meta_freq=5 eval.max_steps=2500 meta_cvae_cfg.kl_coeff=0.01 meta.use_spatial_softmax=false meta.random_affine=true eval.testing=true multitask.task_id=0 skill_subgoal_cfg.horizon=20 multitask.training_task_id=0 verbose=false
Implementation Details
Please see this page for more information about our implementation details including training procedures and hyperparameters.