YP's starred repositories
LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.
ThunderKittens
Tile primitives for speedy kernels
tiny-universe
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
torchtitan
A native PyTorch Library for large model training
llm-reasoners
A library for advanced large language model reasoning
Chinese-Resume-in-Typst
使用 Typst 编写的中文简历, 语法简洁, 样式美观, 开箱即用, 可选是否显示照片
scattermoe
Triton-based implementation of Sparse Mixture of Experts.
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
text-clustering
Easily embed, cluster and semantically label text datasets
LLMs-from-scratch
Implementing a ChatGPT-like LLM from scratch, step by step
so-large-lm
大模型基础: 一文了解大模型基础知识
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.