jllllll / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Llama v2 70b

dred0n opened this issue · comments

commented

can you bring in the main exllama's support for 70b and cut a release please?

Give me a sec, will be done soon.