jllllll / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Llama v2 70b

dred0n opened this issue a year ago · comments

dred0n commented a year ago

can you bring in the main exllama's support for 70b and cut a release please?

jllllll commented a year ago

Give me a sec, will be done soon.

jllllll commented a year ago

Wheels are uploaded: https://github.com/jllllll/exllama/releases/tag/0.0.7