Llama 2 on CPU, and Mac M1/M2 GPU

This is a fork of https://github.com/facebookresearch/llama that runs on CPU and Mac M1/M2 GPU (mps) if available.

Please refer to the official installation and usage instructions as they are exactly the same.

MacBook Pro M1 with 7B model:

There is also an extra message shown during text generation that reports the number and speed at which tokens are being generated.

About

Inference code for LLaMA models on CPU and Mac M1/M2 GPU

Other

Language:Python 93.1%Language:Shell 6.9%