char-1ee

followers

following

stars

Nanyang Technological University

Singapore

Li Xingjian's starred repositories

Mooncake

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonApache-2.0122400

cudf

cuDF - GPU DataFrame Library

Language:C++Apache-2.0798600

duckdb

DuckDB is an analytical in-process SQL database management system

Language:C++MIT2041400

Serving

A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）

Language:C++Apache-2.088700

desktop

Replit Desktop App

Language:TypeScript10800

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonApache-2.058100

paddler

Stateful load balancer custom-tailored for llama.cpp

Language:GoMIT39200

NPKit

NCCL Profiling Kit

Language:PythonMIT8500

dlrover

DLRover: An Automatic Distributed Deep Learning System

Language:PythonNOASSERTION105500

DistServe

Disaggregated serving system for Large Language Models (LLMs).

Language:Jupyter NotebookApache-2.011900

mindcraft

Language:JavaScriptMIT49300

inference

Reference implementations of MLPerf™ inference benchmarks

Language:PythonApache-2.0113000

ThunderKittens

Tile primitives for speedy kernels

Language:CudaMIT133800

AFFiNE

There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable and ready to use.

Language:TypeScriptNOASSERTION3526400

core

MoonBit's Core library

Language:ShellApache-2.046400

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT281200

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0184500

llm-scheduling-artifact

Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“

Language:PythonApache-2.03700

transformer-debugger

Language:PythonMIT395800

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2264600

experiments

My exploration on new technologies.

Language:PythonMIT600

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonMIT41200

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION796000

AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Language:Jupyter NotebookApache-2.0930200

triton

Development repository for the Triton language and compiler

Language:C++MIT1183400

attention_with_linear_biases

Code for the ALiBi method for transformer language models (ICLR 2022)

Language:PythonMIT48900

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT2125500

public-apis

A collective list of free APIs

Language:PythonMIT29711400

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookApache-2.0912000