Jeffrey28/Llama-X

Llama-X: Open Academic Research on Improving LLaMA to SOTA LLM

This is the repo for the Llama-X, which aims to:

Progressively improve the performance of LLaMA to SOTA LLM with open-source community.
Conduct Llama-X as an open academic research which is long-term, systematic and rigorous.
Save the repetitive work of community and we work together to create more and faster increment.

The project will follow these principles:

We will publish all the code, model, data, and experiments details.
We will continuously improve the model version by version and open the newest method.
We will summary the method of each main version as academic papers.
We announce a complete research plan. The contributors are wellcome to cooperate with each other to progressively improve Llama-X through iteration of the target versions.
The check-in of the new model must achieve significant improvement with current version on automatic evaluation.

📣 Please join if you are interested in Llama-X. Let's Make AI Open Again.

Ten main research areas
Llama-X Model Version
Llama-X Evaluation
Llama-X Paper List
Usage
How to contribute

Ten main research areas

[1]. Research on Instruction Tuning

instruction-following tuning

[2]. Research on RLHF & RLAIF

fundamental RLHF
AI learning from AI

[3]. Research on Data Quality

high quality data for pre-training, fine-tuning, user feedbacks, multi-modality, etc

[4]. Research on Long Context Transformer

enable efficient transformers for long sequence (>30k)

[5]. Research on Multi-modal (text + image) Modeling

text + image in; text out

[6]. Research on Multilingual

comparable multilingual performance with English

[7]. Research on Efficient infrastructure and optimization

improve training and inference speed
build deep learning stack which scales predictably

[8]. Research on Evaluation

comprehensive evaluation of model capabilities

[9]. Research on Interpretability

interpret the source of each capability of LLM

[10]. Research on LLM on Actions

combine LLM with search, recommendation and other plugins

Llama-X Model Version

Llama-X	Baseline	Performance
3.0.0 (LLaMA)	GPT-3	Outperform
3.1.0	text-davinci-001	Comparable
3.2.0	text-davinci-002	Comparable
3.3.0	text-davinci-003	Comparable
3.5.0	gpt-35-turbo	Comparable
3.6.0	GPT-4	80% Avg.Gap
3.7.0	GPT-4	60% Avg.Gap
3.8.0	GPT-4	40% Avg.Gap
3.9.0	GPT-4	20% Avg.Gap
4.0.0	GPT-4	Comparable

We are focusing on the above research areas [1] & [3] now, and would public our first version of model (Llama-X 3.0.1) and paper before 4/9/2023.

Llama-X Evaluation

Each new version of Llama-X model should significantly outperform (+>1%) the current version model on the automatic evaluation of all the following Type-A benchmarks. And the additional evaluation for Type-B benchmarks should be added in the 3.6.0+ versions:

Type	Benchmarks
A	MMLU
A	HumanEval
A	GSM-8K
A	NaturalQuestions
A	TruthfulQA
B	Leetcode
B	GRE
B	AP
B	MMLU-Multilingual
B	Visual Inputs (TBD)

Llama-X Paper List

LLaMA: Open and Efficient Foundation Language Models.

Usage

Setup. Install the conda environment:

conda create -n llamax python=3.10
conda activate llamax
git clone https://github.com/AetherCortex/Llama-X.git
cd Llama-X/src
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
cd ../..
pip install -r requirements.txt

Training data example (e.g., Stanford Alpaca):

Llama-X/src/data/alpaca_data.json

Convert LLaMa checkpoint to HuggingFace format:

cd Llama-X/src
python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/llama-7B/ \
    --model_size 7B \
    --output_dir /path/to/llama-7B/hf

Train LLaMA-7B on DeepSpeed Zero-3

deepspeed train.py \
    --model_name_or_path /path/to/llama-7B/hf \
    --data_path /path/to/example_data.json \
    --output_dir /path/to/llama-7B/hf/ft \
    --num_train_epochs 3 \
    --per_device_train_batch_size 64 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --save_total_limit 2 \
    --learning_rate 2e-5 \
    --warmup_steps 2 \
    --logging_steps 2 \
    --lr_scheduler_type "cosine" \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/deepspeed_config.json \
    --fp16 True

The current code of Llama-X support:
- Fully Finetune: Optimize full LLaMA checkpoint, instead of Low-Rank Adaptation (LoRA).
- High Efficiency: Training 7B model with 50k examples/epoch & batch_size=64 within 1 hour on 8 x V100 GPUs.

LLaMA	Batch Size	V100s	Time (h)
7 B	64	8	1.00
13 B	32	8	1.75

Inference

# web demo inference
python generate.py

# batch inference
To Do

How to contribute

Developers can become Contributors by contributing helpful code, data, paper and computing resource, etc.

Code: Including algorithm implementation, training optimization, inference optimization, and model deployment.
Data: Every research area and version iteration requires high-quality data, including instruction-answer, pre-training, multi-modal, multilingual, and user feedbacks data, etc.
Paper: We will maintain a Llama-X Paper List, and use Llama-X as the base model for optimized, fully tested, and significantly improved academic papers. You can check in to the Llama X Paper List.
Computing resource: We hope to help accelerate model iteration speed by coordinating redundant computing power from some developers or non-profit sponsorship from universities/enterprises.

How to communicate with us

Github Issues
Email: llama-x@mail.com
Discord:

Thanks For

This project has been inspired by multiple open source projects:

Meta AI LLaMA

Huggingface Transformers Llama

Alpaca and Alpaca-LoRA

Non-commercial Use:

Llama-X now is only for the academic purpose, and please do not apply it to commercial scenarios or products.

Jeffrey28 / Llama-X