Awesome-LLM-Decision-Making

2023 up-to-date list of PAPERS, CODEBASES, and BENCHMARKS on Decision Making using Foundation Models including LLMs and VLMs.

Please feel free to send me pull requests or contact me to correct any mistakes.

Survey of Foundation Models in Decision Making
Foundation Models as World Models
Foundation Models as Reward Models
Foundation Models as Agent Models
Foundation Models as Representation Encoders
Multi-modal Decision Making Benchmarks

Paper

Survey

"A survey of reinforcement learning informed by natural language." arXiv, 2019. [paper]
"A Survey on Transformers in Reinforcement Learning." arXiv, 2023. [paper]
"Foundation models for decision making: Problems, methods, and opportunities." arXiv, 2023. [paper]
"A Survey of Large Language Models." arXiv, June 2023. [paper][code]
"A Survey on Large Language Model based Autonomous Agents." arXiv, Aug 2023. [paper][code]
"Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security." arXiv, Jan 2024. [paper][code]

World Models

IRIS: "Transformers are sample efficient world models." ICLR, 2023. [paper][code]
UniPi: "Learning Universal Policies via Text-Guided Video Generation." arXiv, 2023.[paper][website]
Dynalang： "Learning to Model the World with Language." arXiv, July 2023. [paper][website][code]

Reward Models

EAGER: "EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL." NIPS, 2022. [paper][code]
"Reward design with language models." ICLR, 2023. [paper][code]
ELLM: "Guiding Pretraining in Reinforcement Learning with Large Language Models." arXiv, 2023. [paper]
"Language to Rewards for Robotic Skill Synthesis." arXiv, June 2023. [paper][website]
"Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning." arXiv, Oct 2023. [paper]
Eureka: "Eureka: Human-Level Reward Design via Coding Large Language Models." arXiv, Oct 2023. [paper][wensite][code]

Agent Models

Generative Agent
- FILM: "Film: Following instructions in language with modular methods." ICLR, 2022. [paper][code][website]
- "Grounding large language models in interactive environments with online reinforcement learning." arXiv, 2023. [paper][code]
- Inner Monologue: "Inner monologue: Embodied reasoning through planning with language models." arXiv, 2022. [paper][website]
- Plan4MC: "Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks." arXiv, 2023. [paper][code][website]
- ProgPrompt: "ProgPrompt: Generating Situated Robot Task Plans using Large Language Models." ICRA, 2023. [paper][website]
- Text2Motion: "Text2Motion: From Natural Language Instructions to Feasible Plans." arXiv, Mar 2023. [paper][website]
- Voyager: "Voyager: An Open-Ended Embodied Agent with Large Language Models." arXiv, May 2023. [paper][code][website]
- Reflexion: "Reflexion: Language Agents with Verbal Reinforcement Learning." arXiv, Mar 2023. [paper][code]
- ReAct: "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR, 2023. [paper][code][website]
- "Generative Agents: Interactive Simulacra of Human Behavior." arXiv, Apr 2023. [paper][code]
- "Cognitive Architectures for Language Agents." arXiv, Sep 2023. [paper][code]
- Retroformer: "Retroformer: Retrospective large language agents with policy gradient optimization." arXiv, Aug 2023. [paper]
- SayCanPay: "Heuristic Planning with Large Language Models using Learnable Domain Knowledge." AAAI, 2024. [paper][code][website]
Embodied AI
- SayCan: "Do as i can, not as i say: Grounding language in robotic affordances." arXiv, 2022. [paper][code][website]
- PaLM-E: "Palm-e: An embodied multimodal language model." arXiv, 2023. [paper][website]
- LM-Nav: "Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action." CoRL, 2022.[paper][code][website]
- ZSP: "Language models as zero-shot planners: Extracting actionable knowledge for embodied agents." ICML, 2022. [paper][code][website]
- DEPS: "Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents." arXiv, 2023. [paper][code]
- TidyBot: "TidyBot: Personalized Robot Assistance with Large Language Models." arXiv, 2023. [paper][website]
- Chatgpt for robotics: "Chatgpt for robotics: Design principles and model abilities." Microsoft Auton. Syst. Robot. Res 2 (2023): 20. [paper]
- KNOWNO: "Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners." arXiv, July 2023. [paper]
- VoxPoser: "VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models." July 2023. [[paper][website]
- RT-1: "RT-1: Robotics Transformer for Real-World Control at Scale." arXiv, Dec 2022. [paper][code]
- RT-2: "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." Deepmind, July 2023. [paper][website]
- MOO: "Open-World Object Manipulation using Pre-trained Vision-Language Models." arXiv, Mar 2023. [paper][website]
- EmbodiedGPT: "EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought." arXiv, May 2023. [paper][code][website]
- RoboCat: "RoboCat: A self-improving robotic agent." arXiv, Jun 2023. [paper][website]
- RT-X: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." [paper][code][website]
- GenSim: "GenSim: Generating Robotic Simulation Tasks via Large Language Models." arXiv, Oct 2023. [paper][code][website]
- "Language Models as Zero-Shot Trajectory Generators." arXiv, Oct 2023. [paper][code][website]
- LLaRP: "Large Language Models as Generalizable Policies for Embodied Tasks." arXiv, Oct 2023. [paper][website]
- CLARA: "CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents." arXiv, June 2023. [paper]
- Ada: "Learning adaptive planning representations with natural language guidance." arXiv, Dec 2023. [paper]
- "Demonstrating Large Language Models on Robots." RSS, 2023. [paper]
- "Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents." NIPS, 2023. [paper]
- "Learning to Learn Faster from Human Feedback with Language Model Predictive Control." arXiv, Feb 2024. [paper]

Representation

Cliport: "Cliport: What and where pathways for robotic manipulation." CoRL, 2021. [paper][code][website]
Vima; "Vima: General robot manipulation with multimodal prompts." ICML, 2023. [paper][code][website]
Perceiver-actor: "Perceiver-actor: A multi-task transformer for robotic manipulation." CoRL, 2022. [paper][code][website]
InstructRL: "Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models." arXiv, 2022. [paper]
Hiveformer: "Instruction-driven history-aware policies for robotic manipulations." CoRL, 2022. [paper][code][website]
LID: "Pre-trained language models for interactive decision-making." NIPS, 2022. [paper][code][website]
LISA: "LISA: Learning Interpretable Skill Abstractions from Language." NIPS, 2022. [paper][code]
LoReL: "Learning language-conditioned robot behavior from offline data and crowd-sourced annotation." CoRL, 2021. [paper][code][website]
GRIF: "Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control." arXiv, 2023. [paper][website]

Benchmark

Manipulation

Meta-World: "Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning." CoRl, 2019. [paper][code][website]
RLbench: James, Stephen, et al. "Rlbench: The robot learning benchmark & learning environment." IEEE Robotics and Automation Letters, 2020. [paper][code][website]
VLMbench: Zheng, Kaizhi, et al. "Vlmbench: A compositional benchmark for vision-and-language manipulation." NIPS, 2022. [paper][code][website]
Calvin: Mees, Oier, et al. "Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks." IEEE Robotics and Automation Letters, 2022. [paper][code][website]

Navigation-and-Manipulation

AI2-THOR "Ai2-thor: An interactive 3d environment for visual ai." arXiv, 2017. [paper][code][website]
Alfred: "Alfred: A benchmark for interpreting grounded instructions for everyday tasks." CVPR, 2020. [paper][code][website]
VirtualHome: "Watch-and-help: A challenge for social perception and human-ai collaboration." arXiv, 2020. [paper][code][website]
Ravens: "Transporter networks: Rearranging the visual world for robotic manipulation." CoRL, 2020. [paper][code][website]
Housekeep: "Housekeep: Tidying virtual households using commonsense reasoning." ECCV, 2022. [paper][code][website]
Behavior-1k: "Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation." CoRL, 2022. [paper][code][website]
Habitat 2.0: "Habitat 2.0: Training home assistants to rearrange their habitat." NIPS, 2021. [paper][code][website]
EgoTV 📺: "Egocentric Task Verification from Natural Language Task Descriptions." ICCV, 2023 (from Meta AI) [paper][code][website]

Game

Minedojo: "Minedojo: Building open-ended embodied agents with internet-scale knowledge." arXiv, 2022. [paper][code][website]
BabyAI: "Babyai: A platform to study the sample efficiency of grounded language learning." ICLR, 2019. [paper][code]
Generative Agents: "Generative Agents: Interactive Simulacra of Human Behavior." arXiv Apr 2023. [paper][website][code]
AgentBench: "AgentBench: Evaluating LLMs as Agents." arXiv, Aug 2023. [paper][website][code]

Tools

Toolformer: "Toolformer: Language Models Can Teach Themselves to Use Tools." arXiv, Feb 2023. [paper][code]

123penny123 / Awesome-LLM-RL