yao-jiashu's repositories

KernelCodeGen

GEMM/FMHA CUDA/HIP kernel code generation using MLIR.

Language:C++Stargazers:5Issues:2Issues:0
Stargazers:0Issues:0Issues:0

CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully :)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

gpu-arch-microbenchmark

Dissecting NVIDIA GPU Architecture

Language:CudaStargazers:0Issues:0Issues:0

HierarchicalKV

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

License:NOASSERTIONStargazers:0Issues:0Issues:0

My-Leetcode-Solution

leetcode solutions as well as necessary data structure and algorithm

Language:C++Stargazers:0Issues:1Issues:0

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

nvbench

CUDA Kernel Benchmarking Library

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

pdfs

Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)

Language:HTMLStargazers:0Issues:0Issues:0

PPoPP2017_artifact

Third party assembler and GEMM library for NVIDIA Kepler GPU

Stargazers:0Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0