krychu / llama

Inference code for LLaMA models on CPU and Mac M1/M2 GPU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Llama 2 on CPU, and Mac M1/M2 GPU

This is a fork of https://github.com/facebookresearch/llama that runs on CPU and Mac M1/M2 GPU (mps) if available.

Please refer to the official installation and usage instructions as they are exactly the same.

image

MacBook Pro M1 with 7B model:

  • MPS (default): ~4.3 words per second
  • CPU: ~0.67 words per second

There is also an extra message shown during text generation that reports the number and speed at which tokens are being generated.

About

Inference code for LLaMA models on CPU and Mac M1/M2 GPU

License:Other


Languages

Language:Python 93.1%Language:Shell 6.9%