Beast code in Giters

okwinds's repositories

GPTQModel

GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Apache-2.0000

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Apache-2.0000

llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language:PythonApache-2.0000

mmengine

OpenMMLab Foundational Library for Training Deep Learning Models

Apache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache-2.0000

xoscar

Python actor framework for heterogeneous computing.

Apache-2.0000