awesome-cross-domain-policy-transfer-for-embodied-agents

This is a collection of research and review papers for cross-domain policy transfer for embodied agents. Feel free to star and fork. Original paper: A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents, IJCAI 2024.

Maintainers

Haoyi Niu, Jianming Hu, Guyue Zhou, Xianyuan Zhan. (Tsinghua University)

Architecture of the survey

The main architecture of the survey: domain gap taxonomy, overarching insights on methodologies, and future trends.

Approaches categorized by handling different domain gaps

Cross-Appearance Policy Transfer

Appearance gaps arise when observations in the source domain (e.g., simulations) exhibit differences in colors, background objects, illumination conditions, and rendering textures as compared to the target domain (e.g., reality), such as variations in coarse and fine rendering or high and low resolutions.

Beyond pick-and-place: Tackling robotic stacking of diverse shapes
- Alex X. Lee, Coline Manon Devin, Yuxiang Zhou, Thomas Lampe, Konstantinos Bousmalis, Jost Tobias Springenberg, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori. CoRL 2021.
A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving
- Guan Wang*, Haoyi Niu*, Desheng Zhu, Jianming Hu, Xianyuan Zhan, Guyue Zhou. NeurIPS ML4AD Workshop, 2022.
Reinforcement Learning with Augmented Data
- Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas. NeurIPS 2020.
Rl-cyclegan: Reinforcement learning aware simulation-to-real
- Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, Mohi Khansari. CVPR 2020.
Learning to Drive from Simulation without Real World Labels
- Alex Bewley; Jessica Rigley; Yuxuan Liu; Jeffrey Hawke; Richard Shen; Vinh-Dieu Lam; Alex Kendall. ICRA 2019.
Vr-goggles for robots: Real-to-sim domain adaptation for visual control
- Jingwei Zhang, Lei Tai, Peng Yun, Yufeng Xiong, Ming Liu, Joschka Boedecker, Wolfram Burgard. RAL 2019.
Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data
- Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong. ICCV 2019.
Meta-sim: Learning to generate synthetic datasets
- Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler. ICCV 2019.
Driving Policy Transfer via Modularity and Abstraction
- Matthias Mueller, Alexey Dosovitskiy, Bernard Ghanem, Vladlen Koltun. CoRL 2018.
Virtual to Real Reinforcement Learning for Autonomous Driving
- Xinlei Pan, Yurong You, Ziyan Wang, Cewu Lu. BMVC 2017.
Unpaired image-to-image translation using cycle-consistent adversarial networks
- Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. ICCV 2017.
Domain randomization for transferring deep neural networks from simulation to the real world
- Josh Tobin; Rachel Fong; Alex Ray; Jonas Schneider; Wojciech Zaremba; Pieter Abbeel. IROS 2017.

Cross-Viewpoint Policy Transfer

Viewpoint gaps arise when the configuration of sensor setups (e.g., camera position and angles, etc.) can significantly influence the downstream policy learning of embodied agents.

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller
- Pratyusha Sharma, Deepak Pathak, Abhinav Gupta. NeurIPS 2019.
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
- Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, Sergey Levine. RSS 2018.
Time-Contrastive Networks: Self-Supervised Learning from Video
- Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine. ICRA 2018.
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
- YuXuan Liu; Abhishek Gupta; Pieter Abbeel; Sergey Levine. ICRA 2018.
Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control
- Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine. CVPR 2018.
Third Person Imitation Learning
- Bradly C Stadie, Pieter Abbeel, Ilya Sutskever. ICLR 2017.

Cross-Dynamics Policy Transfer

Dynamics gaps occur when interactions between embodiments and their deploying environments, or interactions among different parts of the embodiment itself, follow different transitional dynamics, such as stiffness, gear dead zones of embodiments, body mass, and friction.

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
- Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu, Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming Hu, Xianyuan Zhan. DMLR@ICLR 2024.
Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States
- Zidan Wang, Takeru Oba, Takuma Yoneda, Rui Shen, Matthew Walter, Bradly C. Stadie. CoRL 2023.
State Regularized Policy Optimization on Data with Dynamics Shift
- Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An. NeurIPS 2023.
Cross-Domain Policy Adaptation via Value-Guided Data Filtering
- Kang Xu, Chenjia Bai, Xiaoteng Ma, Dong Wang, Bin Zhao, Zhen Wang, Xuelong Li, Wei Li. NeurIPS 2023.
When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
- Haoyi Niu, Shubham Sharma, Yiwen Qiu, Ming Li, Guyue Zhou, Jianming Hu, Xianyuan Zhan. NeurIPS 2022.
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning
- Jinxin Liu, Zhang Hongyin, Donglin Wang. ICLR 2022.
Learning Feasibility to Imitate Demonstrators with Different Dynamics
- Zhangjie Cao, Yilun Hao, Mengxi Li, Dorsa Sadigh. CoRL 2021.
Learning From Imperfect Demonstrations From Agents With Varying Dynamics
- Zhangjie Cao, Dorsa Sadigh. RAL 2021.
Auto-Tuned Sim-to-Real Transfer
- Yuqing Du; Olivia Watkins; Trevor Darrell; Pieter Abbeel; Deepak Pathak. ICRA 2021.
Data-Efficient Domain Randomization With Bayesian Optimization
- Fabio Muratore; Christian Eilers; Michael Gienger; Jan Peters. RAL 2021.
State-Only Imitation Learning for Dexterous Manipulation
- Ilija Radosavovic; Xiaolong Wang; Lerrel Pinto; Jitendra Malik. IROS 2021.
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
- Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov. ICLR 2021.
An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch
- Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone. NeurIPS 2020.
Offline Imitation Learning with a Misspecified Simulator
- Shengyi Jiang, Jingcheng Pang, Yang Yu. NeurIPS 2020.
Active Domain Randomization
- Bhairav Mehta, Manfred Diaz, Florian Golemo, Christopher J. Pal, Liam Paull. CoRL 2020.
State-only Imitation with Transition Dynamics Mismatch
- Tanmay Gangwani, Jian Peng. ICLR 2020.
State Alignment-based Imitation Learning
- Fangchen Liu, Zhan Ling, Tongzhou Mu, Hao Su. ICLR 2020.
Learning dexterous in-hand manipulation
- Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba. IJRR, 2020.
BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators
- Fabio Ramos, Rafael Carvalhaes Possas, Dieter Fox. RSS 2019.
How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?
- Quan Vuong, Sharad Vikram, Hao Su, Sicun Gao, Henrik I. Christensen. Arxiv, 2019.
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
- Xue Bin Peng; Marcin Andrychowicz; Wojciech Zaremba; Pieter Abbeel. ICRA 2018.
Grounded Action Transformation for Robot Learning in Simulation
- Josiah Hanna, Peter Stone. AAAI 2017.
EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
- Aravind Rajeswaran, Sarvjeet Ghotra, Balaraman Ravindran, Sergey Levine. ICLR 2017.
Preparing for the Unknown: Learning a Universal Policy with Online System Identification
- Wenhao Yu, Jie Tan, C. Karen Liu, Greg Turk. arXiv 2017.
Simulation-based design of dynamic controllers for humanoid balancing
- Jie Tan; Zhaoming Xie; Byron Boots; C. Karen Liu. IROS 2016.
Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
- Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba. arXiv, 2016.
Physically consistent state estimation and system identification for contacts
- Svetoslav Kolev, Emanuel Todorov. Humanoids 2015.

Cross-Morphology Policy Transfer

Morphology gaps arise when target embodiments exhibit different morphological designs compared to the source domain agents, e.g., variations in joint types, module shapes, and lengths, which may ultimately lead to a dynamics mismatch. Morphology gaps may also encompass variations in the dimensions and semantic meanings of state and action spaces, such as the number of observational sensors, limbs, and end effectors.

Cross Domain Policy Transfer with Effect Cycle-Consistency
- Ruiqi Zhu, Tianhong Dai, Oya Celiktutan. ICRA 2024.
Polybot: Training One Policy Across Robots While Embracing Variability
- Jonathan Heewon Yang, Dorsa Sadigh, Chelsea Finn. CoRL 2023.
Learning Robot Manipulation from Cross-Morphology Demonstration
- Gautam Salhotra, I-Chun Arthur Liu, Gaurav S. Sukhatme. CoRL 2023.
Multi-embodiment Legged Robot Control as a Sequence Modeling Problem
- Chen Yu; Weinan Zhang; Hang Lai; Zheng Tian; Laurent Kneip; Jun Wang. ICRA 2023.
MetaMorph: Learning Universal Controllers with Transformers
- Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei. ICLR 2022.
Embodied intelligence via learning and evolution
- Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei. Nature Communications, 2021.
Task-agnostic morphology evolution
- Donald J Hejna III, Pieter Abbeel, Lerrel Pinto. ICLR 2021.
Hierarchically Decoupled Imitation For Morphological Transfer
- Donald Hejna, Lerrel Pinto, Pieter Abbeel. ICML 2020.

Cross-Modality Policy Transfer

Cross-Modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning
- Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu. NeurIPS 2021.

In Progress...

Cross-Multi-Gap Policy Transfer

In many complex tasks, we might simultaneously encounter multiple types of domain gaps due to substantially different embodiments and deployed environments.

Citations

If the insights, categorizations, analyses and encapsulations in this survey paper/github collection are helpful with your project development, please cite the following paper:

@inproceedings{
    niu2024comprehensive,
    title={A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents},
    author={Niu, Haoyi and Hu, Jianming and Zhou, Guyue and Zhan, Xianyuan},
    booktitle={International Joint Conference on Artificial Intelligence},
    year={2024}
}

t6-thu / awesome-cross-domain-policy-transfer-for-embodied-agents