Beast code in Giters

wanghz18's repositories

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Language:PythonMIT000

GPTQ inference Triton kernel

Language:Jupyter NotebookApache-2.0000

Language:Python000

000