jiaohuix / ppllama

The paddle implementation of meta's LLaMA.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ppllama

This repo is the paddlepaddle implementation of meta's LLaMA .

Update

3/3: add cpu example.

3/4: add gpu example.

3/5: add a simplychatbot demo with ipywidgets: chatbot.

Quickstart

Aistudio ppllama

Setup

git clone https://github.com/jiaohuix/ppllama.git
cd ppllama && pip install -r requirements.txt
pip install -e ./

In order to download the checkpoints , fill this google form (tokenizer already in ckpt)

# download ckpt
bash scripts/download.sh <MODEL_SIZE>(7B/13B/30B/65B) <TARGET_FOLDER> <PRESIGNED_URL> 

The following is the checkpoints directory:

ckpt
├── 13B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   ├── consolidated.01.pth
│   └── params.json
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   ├── model0.pdparams
│   └── params.json
├── tokenizer_checklist.chk
└── tokenizer.model

Alignment

This repository contains scripts for converting checkpoints from torch to paddle. I use a 2-layer model for inference to ensure that ppllama and llama are aligned, see: align

Inference

Environment configuration and speed:

Device Memory Load speed Inference speed
cpu 32G 6min 20min (1prompt)
cuda 32G - 15sec (4prompt)

cpu:

python -m paddle.distributed.launch  scripts/example_cpu.py --prompt "The capital of Germany is the city of" --mp 1 --ckpt_dir ckpt/7B/ --tokenizer_path  ckpt/tokenizer.model

ppllama cpu

gpu:

python -m paddle.distributed.launch  scripts/example.py --mp 1 --ckpt_dir ckpt/7B/ --tokenizer_path  ckpt/tokenizer.model

Gf52J.png

If you like the project, please show your support by leaving a star ⭐.

License

See the LICENSE file.

About

The paddle implementation of meta's LLaMA.

License:GNU General Public License v3.0


Languages

Language:Python 96.1%Language:Shell 3.9%