C-TC

followers

following

stars

Zurich, Switzerland

Tiancheng Chen's starred repositories

ThunderKittens

Tile primitives for speedy kernels

Language:CudaMIT127600

float8_experimental

This repository contains the experimental PyTorch native float8 training UX

Language:PythonBSD-3-Clause17900

AI-Chip

A list of ICs and IPs for AI, Machine Learning and Deep Learning.

Language:PHP160000

scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Language:PythonApache-2.01129500

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Language:CudaBSD-3-Clause45500

torchtitan

A native PyTorch Library for large model training

Language:PythonBSD-3-Clause119900

mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

Language:C++MIT16300

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

Apache-2.0290300

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

8600

microxcaling

PyTorch emulation library for Microscaling (MX)-compatible data formats

Language:PythonMIT11500

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonMIT41100

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonApache-2.0194400

msccl

Microsoft Collective Communication Library

Language:C++NOASSERTION25700

kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

Language:HTMLNOASSERTION63800

llamafia.github

Language:PythonApache-2.029300

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0709300

Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language:PythonApache-2.049800

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:Python17300

LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Language:PythonApache-2.0808100

ring-flash-attention

Ring attention implementation with flash attention

Language:Python38900

libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Language:PythonApache-2.037700

yiyin

一款照片水印添加工具

Language:TypeScriptGPL-3.041800

Awesome-LLM-System-Papers

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonMIT47100

veScale

A PyTorch Native LLM Training Framework

Language:PythonApache-2.044600

ml-engineering

Machine Learning Engineering Open Book

Language:PythonCC-BY-SA-4.01006100

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonApache-2.089900

superbenchmark

A validation and profiling tool for AI infrastructure

Language:PythonMIT20400

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION902700

xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

Language:C++Apache-2.0232100