Awesome Long-context Language Modeling papers

Introduction (Draft by ChatGPT😄)

Traditional language models like ChatGPT have a fixed context window, which means they can only consider a limited number of tokens (words or subwords) as input to generate the next word in a sequence. For example, the original GPT-3.5 has a maximum context window of 2,048 tokens. This limitation poses challenges when dealing with longer pieces of text, as it may cut off relevant information beyond the context window.

To overcome this limitation and better model long-range dependencies in text, researchers have explored various techniques and architectures. The following are some approaches that might be considered as "long context language models".

PaperList

Memory/Cache-Augmented Models

Some language models incorporate external memory mechanisms, allowing them to store information from past tokens and retrieve it when necessary. These memories enable the model to maintain context over longer segments of text.

Hierarchical Models / Data-Centric or Compress happens in context / key-value

Probably transformer variants, which means that the architecture of the transformers

Transformer Variants (Totally change the KV or position embedding of the transformers)

Researchers have explored variants that can better handle long context by utilizing techniques like sparse attention, axial attention, and reformulating the self-attention mechanism.

Window-Based/On-the-fly Methods

Rather than relying on a fixed context window, some models use a sliding window approach. They process the text in smaller chunks, capturing local dependencies within each window and passing relevant information between adjacent windows.

Analysis

Reinforcement Learning

Some approaches use reinforcement learning to guide the model's attention to focus on important parts of the input text while considering the context.

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Benchmark

CV-Inspired

Contact Me

If you have any questions or comments, please feel free to let us know: 📧 Cheng Deng. The main contribution of the contributor:

Cheng Deng @ SJTU

About

Papers of Long Context Language Model

awesome-list llm long-context nlp

Apache License 2.0