ppllama

This repo is the paddlepaddle implementation of meta's LLaMA .

Update

3/3: add cpu example.

3/4: add gpu example.

3/5: add a simplychatbot demo with ipywidgets: chatbot.

Quickstart

Setup

git clone https://github.com/jiaohuix/ppllama.git
cd ppllama && pip install -r requirements.txt
pip install -e ./

In order to download the checkpoints , fill this google form (tokenizer already in ckpt)

# download ckpt
bash scripts/download.sh <MODEL_SIZE>(7B/13B/30B/65B) <TARGET_FOLDER> <PRESIGNED_URL>

The following is the checkpoints directory:

ckpt
├── 13B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   ├── consolidated.01.pth
│   └── params.json
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   ├── model0.pdparams
│   └── params.json
├── tokenizer_checklist.chk
└── tokenizer.model

Alignment

This repository contains scripts for converting checkpoints from torch to paddle. I use a 2-layer model for inference to ensure that ppllama and llama are aligned, see: align

Inference

Environment configuration and speed:

Device	Memory	Load speed	Inference speed
cpu	32G	6min	20min (1prompt)
cuda	32G	-	15sec (4prompt)

cpu:

python -m paddle.distributed.launch  scripts/example_cpu.py --prompt "The capital of Germany is the city of" --mp 1 --ckpt_dir ckpt/7B/ --tokenizer_path  ckpt/tokenizer.model

gpu:

python -m paddle.distributed.launch  scripts/example.py --mp 1 --ckpt_dir ckpt/7B/ --tokenizer_path  ckpt/tokenizer.model

If you like the project, please show your support by leaving a star ⭐.

License

See the LICENSE file.

About

The paddle implementation of meta's LLaMA.

GNU General Public License v3.0

Languages

Language:Python 96.1%Language:Shell 3.9%