Xu-Chen's starred repositories

ollama

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Language:PythonLicense:Apache-2.0Stargazers:25409Issues:169Issues:4113

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:21983Issues:198Issues:3254

Perplexica

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

Language:TypeScriptLicense:MITStargazers:10693Issues:75Issues:158

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9124Issues:117Issues:124

Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5333Issues:63Issues:92

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:4057Issues:34Issues:433

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonLicense:MITStargazers:3192Issues:33Issues:361

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3167Issues:31Issues:407

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

aspoem

Learn Chinese Poetry With AsPoem.com

Language:TypeScriptLicense:AGPL-3.0Stargazers:2361Issues:9Issues:112

farfalle

🔍 AI search engine - self-host with local or cloud LLMs

Language:TypeScriptLicense:Apache-2.0Stargazers:2187Issues:18Issues:51

swift

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 40+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:2165Issues:18Issues:598

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonLicense:MITStargazers:1400Issues:11Issues:330

clarity-ai

A simple Perplexity AI clone.

Language:TypeScriptLicense:MITStargazers:1092Issues:22Issues:7

aphrodite-engine

PygmalionAI's large-scale inference engine

Language:PythonLicense:AGPL-3.0Stargazers:751Issues:12Issues:135

EAGLE

Official Implementation of EAGLE

Language:PythonLicense:Apache-2.0Stargazers:622Issues:12Issues:73

BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models

Language:PythonLicense:Apache-2.0Stargazers:531Issues:11Issues:85

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonLicense:Apache-2.0Stargazers:434Issues:14Issues:21

ocular

AI Powered Search and Chat for Orgs - Think ChatGpt meets Google Search but powered by your data.

Language:TypeScriptLicense:NOASSERTIONStargazers:424Issues:3Issues:4
Language:PythonLicense:Apache-2.0Stargazers:294Issues:21Issues:2
Language:PythonLicense:Apache-2.0Stargazers:241Issues:11Issues:9

nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:NOASSERTIONStargazers:223Issues:8Issues:17

openai-scala-client

Scala client for OpenAI API

Language:ScalaLicense:MITStargazers:165Issues:11Issues:33

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution.

Language:PythonLicense:Apache-2.0Stargazers:19Issues:2Issues:20
Language:CudaLicense:Apache-2.0Stargazers:15Issues:5Issues:1