GAIR-NLP / Entropy-ABF

Official implementation for 'Extending LLMs’ Context Window with 100 Samples'

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extending LLMs' Context Window with 100 Samples

This is the official repo for "Extending LLMs' Context Window with 100 Samples". Preprint

Introduction

We introduce 'Entropy-Aware ABF' that supports efficient context window extension of RoPE-based LLMs with only 100 samples. The repository contains code and data to reproduce our model.

Model and Data

We release long-context Llama-2-7b-chat extended with our method trained with different data amounts on 🤗Hugging Face:

Data Link
0.1k 🤗eabf-llama2-7b-chat-0.1k
1k 🤗eabf-llama2-7b-chat-1k
3.5k 🤗eabf-llama2-7b-chat-3.5k

We also release our training data on 🤗Hugging Face Datasets.

Quick Guide

Use Entropy-Aware ABF

To use our code, your transformers library should be version 4.31 or higher.

We adopt the paper summarization test proposed in NTK-Aware scaling's blog to serve as a sanity check.

In short, to load the LLaMA model with our method, you should first import the required packages:

from transformers.models.llama.modeling_llama import LlamaForCausalLM
import patch.eabf as eabf

Then, you can load the model by using the right rope_scaling argument and our monkey patching function:

model = LlamaForCausalLM.from_pretrained(MODEL_NAME_OR_PATH, ..., rope_scaling={"type": "eabf", "factor": 4})
eabf.apply_eabf(model)

Reproduce Observation of Attention Scores

Other RoPE-based LLMs might or might not follow the same attention scores pattern as Llama-2-7b-chat, we release our code for retrieving attention scores and computing the 'attention entropy' so that users can apply our method tailored to their model.

About

Official implementation for 'Extending LLMs’ Context Window with 100 Samples'


Languages

Language:Python 100.0%