Awesome-Reasoning-Foundation-Models

A curated list of awesome large AI models, or foundation models, for reasoning. We organize the current foundation models into three categories: language foundation models, vision foundation models, and multimodal foundation models. Further, we elaborate the foundation models in reasoning tasks, including commonsense, mathematical, logical, causal, visual, audio, multimodal, embodied reasoning, etc. Reasoning techniques are also summarized.

We welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute!

0 Survey
1 Relevant Surveys
2 Foundation Models
3 Reasoning Tasks
4 Reasoning Techniques

0 Survey

This repository is primarily based on the following paper:

Reasoning with Foundation Models: Concepts, Methodologies, and Outlook

Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Jirong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, and Zhenguo Li

If you find this repository helpful, please consider citing:

@article{,
  title={Reasoning with Foundation Models: Concepts, Methodologies, and Outlook},
  author={},
  journal={arXiv preprint arXiv:},
  year={2023}
}

1 Relevant Surveys and Links

The Rise and Potential of Large Language Model Based Agents: A Survey - [arXiv] [Link]
Multimodal Foundation Models: From Specialists to General-Purpose Assistants - [arXiv]
A Survey on Multimodal Large Language Models - [arXiv] [Link]
Interactive Natural Language Processing - [arXiv] [Link]
A Survey of Large Language Models - [arXiv] [Link]
Self-Supervised Multimodal Learning: A Survey - [arXiv] [Link]
Large AI Models in Health Informatics: Applications, Challenges, and the Future - [arXiv] [Paper] [Link]
Towards Reasoning in Large Language Models: A Survey - [arXiv] [Paper] [Link]
Reasoning with Language Model Prompting: A Survey - [arXiv] [Paper] [Link]
Awesome Multimodal Reasoning - [Link]

2 Foundation Models

3 Reasoning Tasks

3.1 Commonsense Reasoning

2023/05 | LLM-MCTS | Large Language Models as Commonsense Knowledge for Large-Scale Task Planning - [Paper] [Code] [Project]
2023/05 | Bridging the Gap between Pre-Training and Fine-Tuning for Commonsense Generation - [Paper] [Code]
2022/11 | DANCE | Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles - [Paper] [Code] [Project]
2022/10 | CoCoGen | Language Models of Code are Few-Shot Commonsense Learners - [Paper] [Code]
2021/10 | A Systematic Investigation of Commonsense Knowledge in Large Language Models - [Paper]
2021/05 | Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense - [Paper]

3.1.1 Commonsense Question and Answering (QA)

2019/06 | CoS-E | Explain Yourself! Leveraging Language Models for Commonsense Reasoning - [Paper] [Code]
2018/11 | CQA | CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge - [Paper] [Code] [Project]
2016/12 | ConceptNet | ConceptNet 5.5: An Open Multilingual Graph of General Knowledge - [Paper] [Project]

3.1.2 Physical Commonsense Reasoning

2023/10 | NEWTON | NEWTON: Are Large Language Models Capable of Physical Reasoning? - [Paper] [Code] [Project]
2022/03 | PACS | PACS: A Dataset for Physical Audiovisual CommonSense Reasoning - [Paper] [Code]
2021/10 | VRDP | Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language - [Paper] [Code]
2020/05 | ESPRIT | ESPRIT: Explaining Solutions to Physical Reasoning Tasks - [Paper] [Code]
2019/11 | PIQA | PIQA: Reasoning about Physical Commonsense in Natural Language - [Paper] [Project]

3.1.3 Spatial Commonsense Reasoning

2022/03 | Things not Written in Text: Exploring Spatial Commonsense from Visual Signals - [Paper] [Code]
2021/06 | PROST | PROST: Physical Reasoning of Objects through Space and Time - [Paper] [Code]
2019/02 | GQA | GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering - [Paper] [Project]

Benchmarks, Datasets, and Metrics

2023/06 | CConS | Probing Physical Reasoning with Counter-Commonsense Context -
2023/05 | SummEdits | LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond - [Paper] [Code]
2021/03 | RAINBOW | UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark -
2020/11 | ProtoQA | ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning - [Paper]
2020/10 | DrFact | Differentiable Open-Ended Commonsense Reasoning
2019/11 | CommonGen | CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
2019/08 | Cosmos QA | Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
2019/08 | αNLI | Abductive Commonsense Reasoning -
2019/08 | PHYRE | PHYRE: A New Benchmark for Physical Reasoning -
2019/07 | WinoGrande | WinoGrande: An Adversarial Winograd Schema Challenge at Scale -
2019/05 | MathQA | MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms -
2019/05 | HellaSwag | HellaSwag: Can a Machine Really Finish Your Sentence? -
2019/04 | Social IQa | SocialIQA: Commonsense Reasoning about Social Interactions - [Paper]
2018/08 | SWAG | SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference -
2002/07 | BLEU | BLEU: a Method for Automatic Evaluation of Machine Translation - [Paper]

3.2 Mathematical Reasoning

2022/11 | Tokenization in the Theory of Knowledge - [Paper]
2022/06 | MultiHiertt | MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data
2021/04 | MultiModalQA | MultiModalQA: Complex Question Answering over Text, Tables and Images
2017/05 | Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
2014/04 | Deep Learning in Neural Networks: An Overview - [Paper]
2004 | Wittgenstein on philosophy of logic and mathematics - [Paper]
1989 | CLP | Connectionist Learning Procedures - [Paper]

3.2.1 Arithmetic Reasoning

2022/09 | PromptPG | Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
2022/01 | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models -
2021/03 | SVAMP | Are NLP Models really able to Solve Simple Math Word Problems? - [Paper] [Code]
2021/03 | MATH | Measuring Mathematical Problem Solving With the MATH Dataset -
2016/08 | How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation - [Paper]
2015/09 | Learn to Solve Algebra Word Problems Using Quadratic Programming - [Paper]
2014/06 | Alg514 | Learning to Automatically Solve Algebra Word Problems - [Paper]

3.2.2 Geometry Reasoning

2022/12 | UniGeo / Geoformer | UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
2021/05 | GeoQA / NGS | GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
2021/05 | Geometry3K / Inter-GPS | Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
2015/09 | GeoS | Solving Geometry Problems: Combining Text and Diagram Interpretation - [Paper]

3.2.3 Theorem Proving

2020/10 | Prover | LEGO-Prover: Neural Theorem Proving with Growing Libraries -
2023/09 | Lyra | Lyra: Orchestrating Dual Correction in Automated Theorem Proving
2023/06 | DT-Solver | DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function - [Paper]
2023/05 | Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving
2023/03 | Magnushammer | Magnushammer: A Transformer-based Approach to Premise Selection
2022/10 | DSP | Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs -
2022/05 | Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis
2022/05 | Autoformalization with Large Language Models - [Paper]
2022/05 | HTPS | HyperTree Proof Search for Neural Theorem Proving
2022/05 | Thor | Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers -
2022/02 | Formal Mathematics Statement Curriculum Learning -
2021/07 | Lean 4 | The Lean 4 Theorem Prover and Programming Language -
2021/02 | TacticZero | TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning -
2021/02 | PACT | Proof Artifact Co-training for Theorem Proving with Language Models -
2020/09 | GPT-f |Generative Language Modeling for Automated Theorem Proving -
2019/07 | Formal Verification of Hardware Components in Critical Systems - [Paper]
2019/06 | Metamath | A Computer Language for Mathematical Proofs - [Paper]
2019/05 | CoqGym | Learning to Prove Theorems via Interacting with Proof Assistants
2018/12 | AlphaZero | A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play - [Paper]
2018/04 | TacticToe | TacticToe: Learning to Prove with Tactics
2015/08 | Lean | The Lean Theorem Prover (system description) - [Paper]
2010/07 | Three Years of Experience with Sledgehammer, a Practical Link between Automatic and Interactive Theorem Provers - [Paper]
2010/04 | Formal Methods at Intel - An Overview - [Slides]
2005/07 | Combining Simulation and Formal Verification for Integrated Circuit Design Validation - [Paper]
2003 | Extracting a Formally Verified, Fully Executable Compiler from a Proof Assistant - [Paper]
1996 | Coq | The Coq Proof Assistant-Reference Manual - [Project]
1994 | Isabelle | Isabelle: A Generic Theorem Prover - [Paper]

3.2.4 Scientific Reasoning

2023/07 | SciBench | SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models -
2022/09 | ScienceQA | Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
2022/03 | ScienceWorld | ScienceWorld: Is your Agent Smarter than a 5th Grader?
2012 | Current Topics in Children's Learning and Cognition - [Book]

Benchmarks, Datasets, and Metrics

2023/08 | Math23K-F / MAWPS-F / FOMAS | Guiding Mathematical Reasoning via Mastering Commonsense Formula Knowledge - [Paper]
2023/07 | ARB | ARB: Advanced Reasoning Benchmark for Large Language Models -
2023/05 | SwiftSage | SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks -
2023/05 | TheoremQA | TheoremQA: A Theorem-driven Question Answering dataset -
2022/10 | MGSM | Language Models are Multilingual Chain-of-Thought Reasoners - [Paper] [Code]
2021/10 | GSM8K | Training Verifiers to Solve Math Word Problems - [Paper] [Code] [Blog]
2021/10 | IconQA | IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning -
2021/09 | FinQA | FinQA: A Dataset of Numerical Reasoning over Financial Data -
2021/08 | MBPP / MathQA-Python | Program Synthesis with Large Language Models
2021/08 | HiTab / EA | HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
2021/07 | HumanEval / Codex | Evaluating Large Language Models Trained on Code -
2021/06 | ASDiv / CLD | A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers -
2021/06 | AIT-QA | AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry -
2021/05 | APPS | Measuring Coding Challenge Competence With APPS -
2021/05 | TAT-QA | TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance
2021/03 | SVAMP | Are NLP Models really able to Solve Simple Math Word Problems? -
2021/01 | TSQA / MAP / MRR | TSQA: Tabular Scenario Based Question Answering
2020/10 | HMWP | Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems -
2020/04 | HybridQA | HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
2019/03 | DROP | DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs -
2019 | NaturalQuestions | Natural Questions: A Benchmark for Question Answering Research - [Paper]
2018/09 | HotpotQA | HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering -
2018/09 | Spider | Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task -
2018/03 | ComplexWebQuestions | The Web as a Knowledge-base for Answering Complex Questions -
2017/12 | MetaQA | Variational Reasoning for Question Answering with Knowledge Graph -
2017/09 | GEOS++ | From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems - [Paper]
2017/09 | Math23k | Deep Neural Solver for Math Word Problems - [Paper]
2017/08 | WikiSQL / Seq2SQL | Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning -
2017/08 | Learning to Solve Geometry Problems from Natural Language Demonstrations in Textbooks - [Paper]
2017/05 | TriviaQA | TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension -
2017/05 | GeoShader | Synthesis of Solutions for Shaded Area Geometry Problems - [Paper]
2016/09 | DRAW-1K | Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems -
2016/08 | WebQSP | The Value of Semantic Parse Labeling for Knowledge Base Question Answering - [Paper]
2016/06 | SQuAD | SQuAD: 100,000+ Questions for Machine Comprehension of Text -
2016/06 | WikiMovies | Key-Value Memory Networks for Directly Reading Documents -
2016/06 | MAWPS | MAWPS: A Math Word Problem Repository - [Paper]
2015/09 | Dolphin1878 | Automatically Solving Number Word Problems by Semantic Parsing and Reasoning - [Paper]
2015/08 | WikiTableQA | Compositional Semantic Parsing on Semi-Structured Tables -
2015 | SingleEQ | Parsing Algebraic Word Problems into Equations - [Paper]
2015 | DRAW | DRAW: A Challenging and Diverse Algebra Word Problem Set - [Paper]
2014/10 | Verb395 | Learning to Solve Arithmetic Word Problems with Verb Categorization - [Paper]
2013/10 | WebQuestions | Semantic Parsing on Freebase from Question-Answer Pairs - [Paper]
2013/08 | Free917 | Large-scale Semantic Parsing via Schema Matching and Lexicon Extension - [Paper]
2002/04 | NMI | Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions - [Paper]
1990 | ATIS | The ATIS Spoken Language Systems Pilot Corpus - [Paper]

3.3 Logical Reasoning

2023/10 | LogiGLUE | Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models -
2023/05 | Logic-LM | Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning -
2023/03 | LEAP | Explicit Planning Helps Language Models in Logical Reasoning -
2023/03 | Sparks of Artificial General Intelligence: Early experiments with GPT-4 -
2022/10 | Entailer | Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning -
2022/06 | NeSyL | Weakly Supervised Neural Symbolic Learning for Cognitive Tasks - [Paper]
2022/05 | NeuPSL | NeuPSL: Neural Probabilistic Soft Logic -
2022/05 | NLProofS | Generating Natural Language Proofs with Verifier-Guided Search -
2022/05 | Least-to-Most Prompting | Least-to-Most Prompting Enables Complex Reasoning in Large Language Models -
2022/05 | SI | Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning -
2022/03 | Self-Consistency Improves Chain of Thought Reasoning in Language Models -
2021/11 | NSPS | Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design - [Paper]
2021/09 | DeepProbLog | Neural probabilistic logic programming in DeepProbLog - [Paper]
2021/08 | GABL | Abductive Learning with Ground Knowledge Base - [Paper]
2020/02 | RuleTakers | Transformers as Soft Reasoners over Language -
2019/12 | NMN-Drop | Neural Module Networks for Reasoning over Text -
2019/04 | NS-CL | The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision -
2012 | Logical Reasoning and Learning - [Paper]

3.3.1 Propositional Logic

2022/09 | Propositional Reasoning via Neural Transformer Language Models - [Paper]

3.3.2 Predicate Logic

2021/06 | ILP | Inductive logic programming at 30 - [Paper]
2011 | Statistical Relational Learning - [Paper]

Benchmarks, Datasets, and Metrics

2022/10 | PrOntoQA | Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought -
2022/09 | FOLIO | FOLIO: Natural Language Reasoning with First-Order Logic -
2020/12 | ProofWriter ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language -

3.4 Causal Reasoning

2023/08 | Causal Parrots: Large Language Models May Talk Causality But Are Not Causal
2023/07 | Causal Discovery with Language Models as Imperfect Experts -
2023/06 | From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data -
2023/06 | Corr2Cause | Can Large Language Models Infer Causation from Correlation? -
2023/05 | Code-LLMs | The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code -
2023/04 | Understanding Causality with Large Language Models: Feasibility and Opportunities -
2023/04 | Causal Reasoning and Large Language Models: Opening a New Frontier for Causality -
2023/03 | Can large language models build causal graphs? -
2023/01 | Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis -
2022/09 | Probing for Correlations of Causal Facts: Large Language Models and Causality - [Paper]
2022/07 | Can Large Language Models Distinguish Cause from Effect? - [Paper]
2021/08 | Learning Faithful Representations of Causal Graphs - [Paper]
2021/05 | InferBERT | InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance - [Paper]
2021/02 | Towards Causal Representation Learning -
2020/05 | CausaLM | CausaLM: Causal Model Explanation Through Counterfactual Language Models -
2019/06 | Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation -
2017 | Elements of Causal Inference: Foundations and Learning Algorithms - [Book]
2016 | Actual Causality - [Book]
2013 | Causal Reasoning - [Paper]

3.4.1 Counterfactual Reasoning

2023/07 | Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks -
2023/05 | Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios -
2007 | The Rational Imagination: How People Create Alternatives to Reality - [Paper]
1986 | Norm theory: Comparing reality to its alternatives - [Paper]

Benchmarks, Datasets, and Metrics

2021/12 | CRASS | CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models -
2021/08 | Arctic sea ice | Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere - [Paper]
2014/12 | CauseEffectPairs | Distinguishing cause from effect using observational data: methods and benchmarks -

3.5 Visual Reasoning

2022/11 | G-VUE | Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation -
2021/03 | VLGrammar | VLGrammar: Grounded Grammar Induction of Vision and Language -
2020/12 | Attention over learned object embeddings enables complex visual reasoning -

3.5.1 3D Reasoning

2023/08 | PointLLM | PointLLM: Empowering Large Language Models to Understand Point Clouds -
2023/08 | 3D-VisTA | 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment -
2023/07 | 3D-LLM | 3D-LLM: Injecting the 3D World into Large Language Models -
2022/10 | SQA3D | SQA3D: Situated Question Answering in 3D Scenes -

Benchmarks, Datasets, and Metrics

2021/12 | PTR | PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning -
2019/05 | OK-VQA | OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge -
2016/12 | CLEVR | CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning -

3.6 Audio Reasoning

2022/05 | Self-Supervised Speech Representation Learning: A Review -

3.6.1 Speech

2022/03 | SUPERB-SG | SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities -
2022/02 | Data2Vec | data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language -
2021/10 | WavLM | WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing -
2021/06 | HuBERT | HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units -
2021/05 | SUPERB | SUPERB: Speech processing Universal PERformance Benchmark -
2020/10 | Speech SIMCLR | Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning -
2020/06 | Wav2Vec 2.0 | wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations -
2020/05 | Conformer | Conformer: Convolution-augmented Transformer for Speech Recognition -
2019/10 | Mockingjay | Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders -
2019/04 | APC | An Unsupervised Autoregressive Model for Speech Representation Learning -
2018/07 | CPC | Representation Learning with Contrastive Predictive Coding -
2018/04 | Speech-Transformer | Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition - [Paper]
2017/11 | VQ-VAE | Neural Discrete Representation Learning -
2017/08 | Large-Scale Domain Adaptation via Teacher-Student Learning -

Benchmarks, Datasets, and Metrics

2022/03 | SUPERB-SG | SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities -
2021/11 | VoxPopuli / XLS-R | XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale -
2021/05 | SUPERB | SUPERB: Speech processing Universal PERformance Benchmark -
2020/12 | Multilingual LibriSpeech | MLS: A Large-Scale Multilingual Dataset for Speech Research -
2020/05 | Didi Dictation / Didi Callcenter | A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition -
2019/12 | Libri-Light | Libri-Light: A Benchmark for ASR with Limited or No Supervision -
2019/12 | Common Voice | Common Voice: A Massively-Multilingual Speech Corpus -

3.7 Multimodal Reasoning

3.7.1 Alignment

2023/01 | BLIP-2 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models -

3.7.2 Generation

2023/10 | DALL·E 3 | Improving Image Generation with Better Captions - [Paper] [Project]
2023/06 | Kosmos-2 | Kosmos-2: Grounding Multimodal Large Language Models to the World -
2023/05 | BiomedGPT | BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks -
2023/03 | Visual ChatGPT | Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models -
2023/02 | Kosmos-1 | Language Is Not All You Need: Aligning Perception with Language Models -
2022/07 | Midjourney - [Project]
2022/04 | Flamingo | Flamingo: a Visual Language Model for Few-Shot Learning -
2021/12 | MAGMA | MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning -

3.7.3 Multimodal Understanding

2023/09 | Q-Bench | Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision -
2023/05 | DetGPT | DetGPT: Detect What You Need via Reasoning -
2023/03 | Vicuna | Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality - [Blog]
2022/12 | DePlot | DePlot: One-shot visual language reasoning by plot-to-table translation -
2022/12 | MatCha | MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering -

Benchmarks, Datasets, and Metrics

2023/06 | LVLM-eHub | LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models -
2023/06 | LAMM | LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark -
2023/05 | AttackVLM | On Evaluating Adversarial Robustness of Large Vision-Language Models -
2023/05 | POPE | Evaluating Object Hallucination in Large Vision-Language Models -
2023/05 | MultimodalOCR | On the Hidden Mystery of OCR in Large Multimodal Models -
2022/10 | ObjMLM | Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
2022/06 | RAVEN / ARC | Evaluating Understanding on Conceptual Abstraction Benchmarks -
2021/06 | LARC | Communicating Natural Programs to Humans and Machines -
2014/11 | CIDEr / PASCAL-50S / ABSTRACT-50S | CIDEr: Consensus-based Image Description Evaluation -

3.8 Embodied Reasoning

2023/11 | OpenFlamingo | Vision-Language Foundation Models as Effective Robot Imitators -
2023/07 | RT-2 | RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control -
2023/05 | RAP | Reasoning with Language Model is Planning with World Model -
2022/12 | RT-1 | RT-1: Robotics Transformer for Real-World Control at Scale
2022/10 | Skill Induction and Planning with Latent Language -
2022/05 | Gato | A Generalist Agent -
2022/04 | SMs | Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language -
2022/02 | | Pre-Trained Language Models for Interactive Decision-Making -
2022/01 | Language-Planner | Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents -
2021/11 | Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning -
2020/09 | Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions -
2016/01 | AlphaGo | Mastering the game of Go with deep neural networks and tree search - [Paper]
2014/05 | Gesture in reasoning: An embodied perspective - [Paper]

3.8.1 Introspective Reasoning

2022/11 | PAL | PAL: Program-aided Language Models -
2022/09 | ProgPrompt | ProgPrompt: Generating Situated Robot Task Plans using Large Language Models -
2022/09 | Code as Policies | Code as Policies: Language Model Programs for Embodied Control -
2022/04 | SayCan | Do As I Can, Not As I Say: Grounding Language in Robotic Affordances -
2012 | Introspective Learning and Reasoning - [Paper]

3.8.2 Extrospective Reasoning

2023/06 | Statler | Statler: State-Maintaining Language Models for Embodied Reasoning -
2023/02 | Planner-Actor-Reporter | Collaborating with language models for embodied reasoning -
2023/02 | Toolformer | Toolformer: Language Models Can Teach Themselves to Use Tools -
2022/12 | LLM-Planner | LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models -
2022/10 | ReAct | ReAct: Synergizing Reasoning and Acting in Language Models -
2022/10 | Self-Ask | Measuring and Narrowing the Compositionality Gap in Language Models -
2022/07 | Inner Monologue | Inner Monologue: Embodied Reasoning through Planning with Language Models -

3.8.3 Multi-agent Reasoning

2023/07 | Federated LLM | Federated Large Language Model: A Position Paper -
2023/07 | Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems -
2023/07 | Co-LLM-Agents | Building Cooperative Embodied Agents Modularly with Large Language Models -
2023/05 | Improving Factuality and Reasoning in Language Models through Multiagent Debate -
2017/02 | FIoT | FIoT: An agent-based framework for self-adaptive and self-organizing applications based on the Internet of Things - [Paper]
2004 | A Practical Guide to the IBM Autonomic Computing Toolkit - [Book]

3.8.4 Driving Reasoning

2023/10 | Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving -
2023/10 | Vision Language Models in Autonomous Driving and Intelligent Transportation Systems -
2023/10 | DriveGPT4 | DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model -
2023/09 | MotionLM | MotionLM: Multi-Agent Motion Forecasting as Language Modeling -
2023/06 | End-to-end Autonomous Driving: Challenges and Frontiers -
2023/04 | Graph-based Topology Reasoning for Driving Scenes -
2022/09 | Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe -
2021/11 | Artificial intelligence: A powerful paradigm for scientific research - [Paper]

Benchmarks, Datasets, and Metrics

2023/09 | NuPrompt / PromptTrack | Language Prompt for Autonomous Driving -
2023/08 | DriveLM | Drive on Language: Unlocking the future where autonomous driving meets the unlimited potential of language - [Code]
2023/07 | LCTGen | Language Conditioned Traffic Generation
2023/05 | NuScenes-QA | NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario -
2022/06 | BEHAVIOR-1K | BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation -
2021/08 | iGibson | iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks -
2021/06 | Habitat 2.0 | Habitat 2.0: Training Home Assistants to Rearrange their Habitatz -
2020/04 | RoboTHOR | RoboTHOR: An Open Simulation-to-Real Embodied AI Platform -
2019/11 | HAD | Grounding Human-to-Vehicle Advice for Self-driving Vehicles -
2019/04 | Habitat | Habitat: A Platform for Embodied AI Research -
2018/08 | Gibson | Gibson Env: Real-World Perception for Embodied Agents -
2018/06 | VirtualHome | VirtualHome: Simulating Household Activities via Programs -

3.9 Other Tasks and Applications

3.9.1 Theory of Mind (ToM)

2023/02 | ToM | Theory of Mind Might Have Spontaneously Emerged in Large Language Models -

3.9.2 LLMs for Weather Prediction

2022/09 | MetNet-2 | Deep learning for twelve hour precipitation forecasts - [Paper]
2023/07 | Pangu-Weather | Accurate medium-range global weather forecasting with 3D neural networks - [Paper]