RAG-Survey

We are working hard to improve and update, more content will be presented soon. 😊 Stay tuned!

🚀(New) [18 Dec 2023] We release the RAG Survey on arXiv "Retrieval-Augmented Generation for Large Language Models: A Survey"

If you find our survey useful for your research, please cite the following paper:

@article{RAGSurvey,
      title={Retrieval-Augmented Generation for Large Language Models: A Survey}, 
      author={Yunfan Gao and Yun Xiong and Xinyu Gao and Kangxiang Jia and Jinliu Pan and Yuxi Bi and Yi Dai and Jiawei Sun and Meng Wang and Haofen Wang},
      year={2023},
    journal={arXiv preprint arXiv:2312.10997},
    url={http://arxiv.org/abs/2312.10997}
}

Timeline of RAG

RAG vs Fine-tuning

Paradigm of RAG

Taxonomy of Core Components

Augmentation Stage

Pre-training

1.Improving language models by retrieving from trillions of tokens [paper][code]

2.Few-shot Learning with Re-trieval Augmented Language Models [paper]

3.Toolformer: Language Models Can Teach Themselves to Use Tools[paper]

4.Copy is all you need[paper]

5.In-context learning with retrieval augmented encoder-decoder language model[paper]

6.Shall we pretrain autoregressive language models with retrieval?[paper]

7.Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP[paper]

Fine-tuning

1.Dense Passage Retrieval for Open-Domain Question Answering[paper]

2.UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation[paper][code]

3.Distilling knowledge from reader to retriever for question answering[paper]

4.RA-DIT: Retrieval-Augmented Dual Instruction Tuning[paper]

5.Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection[paper]

6.Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation[paper]

7.Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data [paper] [code]

8.Replug: Retrieval-augmented black-box language models [paper]

9.Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In [paper][code]

Inference

1.Generalization through Memorization: Nearest Neighbor Language Models[paper]

2.DEMONSTRATE–SEARCH–PREDICT: Composing retrieval and language models for knowledge-intensive NLP [paper][code]

3.Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface. [paper]

4.Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. [paper][code]

5.Generate rather than Retrieve: Large Language Models are Strong Context Generators [paper] [code]

6.In-Context Retrieval-Augmented Language Models [paper]

Augmentation Data

Unstructured Data

1.UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation[paper][code]

2.From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL [paper]

3.Copy is all you need [paper]

Structured Data

1.FABULA: Intelligence Report Generation Using Retrieval-Augmented Narrative Construction [paper]

2.Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation [paper]

3.KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases [paper]

4.Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT [paper]

LLM Generated Content

1.Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory [paper]

2.DEMONSTRATE–SEARCH–PREDICT: Composing retrieval and language models for knowledge-intensive NLP [paper]

3.Recitation-augmented language models[paper]

4.Generate rather than Retrieve: Large Language Models are Strong Context Generators [paper]

5.Self-Knowledge Guided Retrieval Augmentation for Large Language Models [paper]

Augmentation Process

Once Retrieval

1.Retrieval-augmented generation for knowledge-intensive nlp tasks [paper]

2.UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation [paper]

3.Augmented Large Language Models with Parametric Knowledge Guiding [paper]

4.Learning to Retrieve In-Context Examples for Large Language Models.[paper]

5.Few-shot Learning with Re-trieval Augmented Language Models [paper] 6.Replug: Retrieval-augmented black-box language models [paper]

7.Recitation-augmented language models[paper]

Iterative Retrieval

1.DEMONSTRATE–SEARCH–PREDICT: Composing retrieval and language models for knowledge-intensive NLP [paper][code]

2.Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation [paper]

3.Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy[paper]

4.RETRIEVAL-GENERATION SYNERGY AUGMENTED LARGE LANGUAGE MODELS [paper]

Adaptive Retrieval

1.Active Retrieval Augmented Generation[paper][code]

2.Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [paper]

3.In-context learning with retrieval augmented encoder-decoder language model [paper]

Acknowledgments

We would like to extend our deepest gratitude to the following authors and researchers. Their exceptional contributions in the field of RAG, along with their willingness to share their findings, have been truly commendable.Without their insightful research, invaluable experience, and generous sharing, we would not have been able to present the material associated with RAG as extensively in our survey. We reiterate our profound appreciation to all the researchers, industry professionals, and knowledge sharers. Additionally, we extend our thanks to everyone who has provided us with invaluable insights.

alleniver / RAG-Survey