neuralmagic/nm-vllm Issues
[Usage]:
Closed 2[Feature]: Support LLama3
Closed 2How to get the sparsed model?
Closed 2Sparsity benchmarks
Closed 2[Doc]: Support Mixtral?
Closed 3
A high-throughput and memory-efficient inference and serving engine for LLMs