lcy-seso

followers

following

stars

MSRA

China

Cao Ying's repositories

DLFrameworkTest

My tests and experiments with some popular dl frameworks.

Language:Python8 40

LearningNotes

My learning notes.

Language:TeX6 40

EfficientAttention

300

AI-System

System for AI Education Resource.

Language:PythonCC-BY-4.0020

buddy-mlir

An MLIR-Based Ideas Landing Project

Language:C++Apache-2.0010

lcy-seso.github.io

Ying's learning notes.

Language:SCSSMIT010

taichi

Productive & portable high-performance programming in Python.

Language:C++Apache-2.0020

accelerated-scan

Accelerated First Order Parallel Associative Scan

MIT000

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

CC0-1.0010

awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

010

Carrot

Language:PythonMIT030

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaMIT000

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION010

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause000

flash-fft-conv

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

Language:C++Apache-2.0000

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

MIT000

gated_linear_attention

Language:PythonMIT000

ggml

Tensor library for machine learning

Language:CMIT010

llama

Inference code for LLaMA models

Language:PythonGPL-3.0000

llama.cpp

Port of Facebook's LLaMA model in C/C++

Language:CMIT010

llm-foundry

LLM training code for MosaicML foundation models

Language:PythonApache-2.0000

loopy

A code generator for array-based code on CPUs and GPUs

Language:PythonMIT000

mamba

Apache-2.0000

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Language:PythonMIT010

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:C++NOASSERTION020

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.0010

SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

000

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonApache-2.0000

whisper.cpp

Port of OpenAI's Whisper model in C/C++

MIT000

wmma_extension

An extension library of WMMA API (Tensor Core API)

Language:CudaMIT010