Cao Ying (lcy-seso)

lcy-seso

Geek Repo

Company:MSRA

Location:China

Github PK Tool:Github PK Tool

Cao Ying's repositories

DLFrameworkTest

My tests and experiments with some popular dl frameworks.

Language:PythonStargazers:8Issues:4Issues:0

LearningNotes

My learning notes.

Language:TeXStargazers:6Issues:4Issues:0

AI-System

System for AI Education Resource.

Language:PythonLicense:CC-BY-4.0Stargazers:0Issues:2Issues:0

buddy-mlir

An MLIR-Based Ideas Landing Project

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0

Experiment-Miscellany

Experiments with isl.

Language:C++Stargazers:0Issues:2Issues:0

lcy-seso.github.io

Ying's learning notes.

Language:SCSSLicense:MITStargazers:0Issues:1Issues:0

taichi

Productive & portable high-performance programming in Python.

Language:C++License:Apache-2.0Stargazers:0Issues:2Issues:0

accelerated-scan

Accelerated First Order Parallel Associative Scan

License:MITStargazers:0Issues:0Issues:0

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

License:CC0-1.0Stargazers:0Issues:1Issues:0

awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

Stargazers:0Issues:1Issues:0
Language:PythonLicense:MITStargazers:0Issues:3Issues:0

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:1Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

flash-fft-conv

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

ggml

Tensor library for machine learning

Language:CLicense:MITStargazers:0Issues:1Issues:0
Language:CudaStargazers:0Issues:1Issues:0

llama

Inference code for LLaMA models

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

llama.cpp

Port of Facebook's LLaMA model in C/C++

Language:CLicense:MITStargazers:0Issues:1Issues:0

llm-foundry

LLM training code for MosaicML foundation models

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

loopy

A code generator for array-based code on CPUs and GPUs

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

MISA

Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:C++License:NOASSERTIONStargazers:0Issues:2Issues:0

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Stargazers:0Issues:0Issues:0

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

whisper.cpp

Port of OpenAI's Whisper model in C/C++

License:MITStargazers:0Issues:0Issues:0

wmma_extension

An extension library of WMMA API (Tensor Core API)

Language:CudaLicense:MITStargazers:0Issues:1Issues:0