okwinds's repositories

GPTQModel

GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

License:Apache-2.0Stargazers:0Issues:0Issues:0

inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

License:Apache-2.0Stargazers:0Issues:0Issues:0

llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mmengine

OpenMMLab Foundational Library for Training Deep Learning Models

License:Apache-2.0Stargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0

xoscar

Python actor framework for heterogeneous computing.

License:Apache-2.0Stargazers:0Issues:0Issues:0