Cao Ying's repositories
DLFrameworkTest
My tests and experiments with some popular dl frameworks.
LearningNotes
My learning notes.
buddy-mlir
An MLIR-Based Ideas Landing Project
Experiment-Miscellany
Experiments with isl.
lcy-seso.github.io
Ying's learning notes.
accelerated-scan
Accelerated First Order Parallel Associative Scan
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
flash-attention
Fast and memory-efficient exact attention
flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
llama
Inference code for LLaMA models
llm-foundry
LLM training code for MosaicML foundation models
loopy
A code generator for array-based code on CPUs and GPUs
memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
whisper.cpp
Port of OpenAI's Whisper model in C/C++
wmma_extension
An extension library of WMMA API (Tensor Core API)