Xu-Chen's starred repositories
flashinfer
FlashInfer: Kernel Library for LLM Serving
TLLM_QMM
TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
GPTModels.nvim
GPTModels - a multi model, window based LLM AI plugin for neovim, with an emphasis on stability and clean code
mistral-inference
Official inference library for Mistral models
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Perplexica
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
clarity-ai
A simple Perplexity AI clone.
aphrodite-engine
PygmalionAI's large-scale inference engine
openai-scala-client
Scala client for OpenAI API