This is a fork of https://github.com/facebookresearch/llama that runs on CPU and Mac M1/M2 GPU (mps) if available.
Please refer to the official installation and usage instructions as they are exactly the same.
![image](https://private-user-images.githubusercontent.com/947457/254658419-8a7bd5c8-1d45-4835-8463-64e12486d0e9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg3NzY5NzcsIm5iZiI6MTcxODc3NjY3NywicGF0aCI6Ii85NDc0NTcvMjU0NjU4NDE5LThhN2JkNWM4LTFkNDUtNDgzNS04NDYzLTY0ZTEyNDg2ZDBlOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYxOVQwNTU3NTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01N2FiN2I3NWQ4YTRiN2Y3YWIxZDgxY2NkYjYwZWJmYTZhNGRiZTZlZWI5ODM5NjYwNWY4ZTlmNWJkNmEwMzVkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.T1yaqg63sXHS2aGAfmr1l-M57qVawdHncSoTT-FyPBA)
MacBook Pro M1 with 7B model:
- MPS (default): ~4.3 words per second
- CPU: ~0.67 words per second
There is also an extra message shown during text generation that reports the number and speed at which tokens are being generated.