Beast code in Giters

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 40+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonApache-2.02165 18 598

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonMIT1400 11 330

clarity-ai

A simple Perplexity AI clone.

Language:TypeScriptMIT1092 22 7

aphrodite-engine

PygmalionAI's large-scale inference engine

Language:PythonAGPL-3.0751 12 135

EAGLE

Official Implementation of EAGLE

Language:PythonApache-2.0622 12 73

BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models

Language:PythonApache-2.0531 11 85

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.0434 14 21

ocular

AI Powered Search and Chat for Orgs - Think ChatGpt meets Google Search but powered by your data.

Language:TypeScriptNOASSERTION424 3 4

llamafia.github

Language:PythonApache-2.0294 21 2

Eurus

Language:PythonApache-2.0241 11 9

nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonNOASSERTION223 8 17

KwaiYii

205 8 12

openai-scala-client

Scala client for OpenAI API

Language:ScalaMIT165 11 33

transformer_vq

Language:Python152 2 2

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution.

Language:Python22 3 2

GPTQModel

Language:PythonApache-2.019 2 20

Sparse-Marlin

Language:CudaApache-2.015 5 1