VaninaY / awesome-multi-modal-reinforcement-learning

A curated list of Multi-Modal Reinforcement Learning resources (continually updated)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Multi-Modal Reinforcement Learning

This is a collection of research papers for Multi-Modal reinforcement learning (MMRL). And the repository will be continuously updated to track the frontier of MMRL. Some papers may not be relevant to RL, but we include them anyway as they may be useful for the research of MMRL.

Welcome to follow and star!

Introduction

Multi-Modal RL agents focus on learning from video (images), language (text), or both, as humans do. We believe that it is important for intelligent agents to learn directly from images or text, since such data can be easily obtained from the Internet.

飞书20220922-161353

Table of Contents

Papers

format:
- [title](paper link) [links]
  - authors.
  - key words.
  - experiment environment.

ICLR 2023

  • PaLI: A Jointly-Scaled Multilingual Language-Image Model(notable top 5%)

    • Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
    • Keyword: amazing zero-shot, language component and visual component
    • ExpEnv: None
  • VIMA: General Robot Manipulation with Multimodal Prompts

    • Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan. NeurIPS Workshop 2022
    • Key Words: multimodal prompts, transformer-based generalist agent model, large-scale benchmark
    • ExpEnv: VIMA-Bench, VIMA-Data
  • MIND ’S EYE: GROUNDED LANGUAGE MODEL REASONING THROUGH SIMULATION

    • Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai
    • Keyword: language2physical-world, reasoning ability
    • ExpEnv: MuJoCo

ICLR 2022

ICLR 2021

ICLR 2019

NeurIPS 2022

NeurIPS 2021

NeurIPS 2018

ICML 2022

ICML 2019

ICML 2017

CVPR 2022

CoRL 2022

Other

ArXiv

  • Multimodal Reinforcement Learning for Robots Collaborating with Humans

    • Afagh Mehri Shervedani, Siyu Li, Natawut Monaikul, Bahareh Abbasi, Barbara Di Eugenio, Milos Zefran
    • Key Words: robust and deliberate decisions, end-to-end training, importance enhancement, similarity, improve IRL training process multimodal RL domains
    • ExpEnv: None
  • See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction

    • Maria Attarian, Advaya Gupta, Ziyi Zhou, Wei Yu, Igor Gilitschenski, Animesh Garg
    • Keyword: cognitive planning, language-guided video prediction
    • ExpEnv: None
  • Open-vocabulary Queryable Scene Representations for Real World Planning

    • Boyuan Chen, Fei Xia, Brian Ichter, Kanishka Rao, Keerthana Gopalakrishnan, Michael S. Ryoo, Austin Stone, Daniel Kappler
    • Key Words: Target Detection, Real World, Robotic Tasks
    • ExpEnv: Say Can
  • Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    • Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, Andy Zeng
    • Key Words: real world, natural language
    • ExpEnv: Say Can

Contributing

Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Awesome Multi-Modal Reinforcement Learning is released under the Apache 2.0 license.

About

A curated list of Multi-Modal Reinforcement Learning resources (continually updated)

License:Apache License 2.0