language_to_reward_2023

This repository contains code to reproduce the results in the paper "Language to Rewards for Robotic Skill Synthesis".

Abstract

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs.

In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.

Installation and Usage

# Create a Python venv, make sure to use Python version >= 3.9
python3 --version
python3 -m venv /tmp/l2r
. /tmp/l2r/bin/activate

# Build and install mujoco_mpc
git clone https://github.com/google-deepmind/mujoco_mpc.git
cd mujoco_mpc
# Latest MJPC commit at the time of release. Using `main` might work too.
git checkout c5c7ead065b7f4034ab265a13023231900dbfaa7

# Compile mujoco_mpc from source and install, this step can take a few minutes.
pip install ./python

cd ..

# Build and install language_to_reward_2023
git clone https://github.com/google-deepmind/language_to_reward_2023.git
cd language_to_reward_2023

# Compile language_to_reward_2023 from source and install, this step can take a few minutes.
pip install .

# Run the demo
python -m language_to_reward_2023.user_interaction --api_key=<Open AI API key>

Notes

In the published paper, we showed the robot moon walking. This was done with an older version of the MuJoCo model, which had unrealistically strong actuators. In the model used in this repository, moonwalking is not possible, regretfully.

Citing this work

@article{yu2023language,
title={Language to Rewards for Robotic Skill Synthesis},
author={Yu, Wenhao and Gileadi, Nimrod and Fu, Chuyuan and Kirmani, Sean and Lee, Kuang-Huei and Gonzalez Arenas, Montse and Lewis Chiang, Hao-Tien and Erez, Tom and Hasenclever, Leonard and Humplik, Jan and Ichter, Brian and Xiao, Ted and Xu, Peng and Zeng, Andy and Zhang, Tingnan and Heess, Nicolas and Sadigh, Dorsa and Tan, Jie and Tassa, Yuval and Xia, Fei},
year={2023},
journal={Conference of Robot Learning 2023},
}

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

google-deepmind / language_to_reward_2023