jllllll / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

jllllll/exllama Issues

Run on CPU without AVX2
Updated 4 months ago
Strange output
Closed 7 months ago2
Cant import exllama
Closed a year ago5
Running Llama2 on multiple GPUs outputs gibberish
Closed a year ago1
Llama2 70B: can't use more than 2048 tokens context
Closed a year ago13
Error checking compiler version for cl: [WinError 2]
Closed a year ago4
ImportError: cannot import name 'make_q4' from 'exllama_ext' (unknown location)
Closed a year ago2
How do I enable FlashAttention 2 exllama in oobabooga webui?
Closed a year ago2
Support wheels for CUDA 12.1 and 12.2
Closed a year ago2
Llama v2 70b
Closed a year ago2
how to install this module via requirement.txt file
Closed a year ago2
Can you provide the whl of CUDAv11.8?
Closed a year ago1
nvidia jetson orin wheel ?
Updated a year ago1