okwinds's repositories
GPTQModel
GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
xoscar
Python actor framework for heterogeneous computing.