nyunAI's repositories
Language:PythonApache-2.0000
Language:PythonAGPL-3.0000
Language:PythonAGPL-3.0000
Language:PythonAGPL-3.0000
Language:Python000
Language:Python000
Language:Python000
Language:C++Apache-2.0000
AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf
Apache-2.0000
FLAP
Patch for Grouped Query Attention
Language:PythonApache-2.0000
nyuntam-docs
This is the official documentation for nyuntam
Language:Python000
qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:PythonApache-2.0000