Anthony Chang's repositories
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Language:PythonApache-2.0000
bug_opencl_boost_compute
Minimal example for reproducing segfault issue with Boost.Compute
Language:CMake000
CMakeExamples
To understand how to leverage CMake effectively
Language:C++000
Language:C++000
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++BSD-3-Clause000
HIP-Performance-Optmization-on-VEGA64
14 basic topics for VEGA64 performance optmization
Language:C++000
HIPIFY
HIPIFY: Convert CUDA to Portable C++ Code
Language:C++000
rocBLAS
Next generation BLAS implementation for ROCm platform
MIT000
rocFFT
Next generation FFT implementation for ROCm
Language:C++MIT000
SGEMM_on_VEGA
An alternative SGEMM implementation on AMD Vega Series
Language:Assembly000
Tensile
Stretching GPU performance for GEMMs and tensor contractions.
Language:PythonMIT000