Xu-Chen's starred repositories

flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Language:CudaLicense:Apache-2.0Stargazers:18Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:821Issues:0Issues:0

Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Language:CudaStargazers:93Issues:0Issues:0

TLLM_QMM

TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.

Language:C++License:Apache-2.0Stargazers:9Issues:0Issues:0

MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Language:PythonLicense:NOASSERTIONStargazers:784Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:66Issues:0Issues:0

GPTModels.nvim

GPTModels - a multi model, window based LLM AI plugin for neovim, with an emphasis on stability and clean code

Language:LuaLicense:MITStargazers:34Issues:0Issues:0

GPTQModel

An easy-to-use LLM quantization and inference toolkit based on GPTQ algorithm (weight-only quantization).

Language:PythonLicense:Apache-2.0Stargazers:27Issues:0Issues:0

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution.

Language:PythonStargazers:31Issues:0Issues:0

ocular

AI Powered Search and Chat for Orgs - Think ChatGPT meets Google Search but powered by your data.

Language:TypeScriptLicense:NOASSERTIONStargazers:431Issues:0Issues:0

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9279Issues:0Issues:0

farfalle

🔍 AI search engine - self-host with local or cloud LLMs

Language:TypeScriptLicense:Apache-2.0Stargazers:2323Issues:0Issues:0
Language:CudaLicense:Apache-2.0Stargazers:16Issues:0Issues:0

EAGLE

Official Implementation of EAGLE-1 and EAGLE-2

Language:PythonLicense:Apache-2.0Stargazers:671Issues:0Issues:0

aspoem

Learn Chinese Poetry With AsPoem.com

Language:TypeScriptLicense:AGPL-3.0Stargazers:2415Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:3075Issues:0Issues:0

Perplexica

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

Language:TypeScriptLicense:MITStargazers:11235Issues:0Issues:0

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonLicense:Apache-2.0Stargazers:463Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:247Issues:0Issues:0

ollama

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

Language:GoLicense:MITStargazers:79365Issues:0Issues:0

clarity-ai

A simple Perplexity AI clone.

Language:TypeScriptLicense:MITStargazers:1112Issues:0Issues:0

aphrodite-engine

PygmalionAI's large-scale inference engine

Language:PythonLicense:AGPL-3.0Stargazers:802Issues:0Issues:0

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonLicense:MITStargazers:3275Issues:0Issues:0

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonLicense:MITStargazers:1459Issues:0Issues:0

nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:NOASSERTIONStargazers:241Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:4135Issues:0Issues:0

Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Language:ShellStargazers:6370Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:305Issues:0Issues:0

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5372Issues:0Issues:0

openai-scala-client

Scala client for OpenAI API

Language:ScalaLicense:MITStargazers:169Issues:0Issues:0