Xu Zhang's repositories
llama.cpp
LLM inference in C/C++
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
note
Personal learning note
python_backend
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
triton
Development repository for the Triton language and compiler
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs