Beast code in Giters

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.01125300

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

98000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02487400

DistServe

Disaggregated serving system for Large Language Models (LLMs).

Language:Jupyter NotebookApache-2.024100

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookApache-2.0215700

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0174700

xcpc-algorithm-templates

XCPC/ICPC/CCPC 算法模板

Language:C++MIT47800

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Language:PythonApache-2.0417200

CUDATutorial

A CUDA tutorial to make people learn CUDA program from 0

Language:Cuda16400

maxas

Assembler for NVIDIA Maxwell architecture

Language:SassMIT93600

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Language:CudaApache-2.078300

imisszxq

yy-space's starred repositories

mscclpp

ktransformers

QuaRot

marlin

llama.cpp

sarathi-serve

lmdeploy

cuda-training-series

NeMo

Mooncake

vllm

DistServe

lectures

TransformerEngine

xcpc-algorithm-templates

sglang

CUDATutorial

maxas

How_to_optimize_in_GPU

Skywork-MoE

flash-llm

fp6_llm

flashinfer

qserve

Atom

cutlass-kernels

llm.c

MatmulTutorial

glake

CUDATutorial