sgemm

There are 1 repository under sgemm topic.

Liu-xiandong / How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
gpu-acceleration elementwise reduce sgemm sgemv high-performance-computing hpc
Language:Cuda 786
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
cuda sgemm
Language:Cuda 198
salykova / matmul.c
Fast multi-threaded matrix multiplication in C
c cpu fast-matrix-multiplication gemm matrix-multiplication openmp sgemm
Language:C 159
mz24cn / gemm_optimization
The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能，提供binary，开盒即用。
blas cublas clblas clblast mkl sgemm gemm-optimization clnet gemm opencl matrix-multiplication
Language:C 14
Stefan20162016 / maxas-explained
maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
assembler cuda cuda-kernels maxwell sgemm
Language:CSS 13
yui0 / ugemm
GEMM
gemm opencl gpgpu sgemm sse avx simd single-header-lib opengl glsl gles
Language:C 10
c3sr / scope
A benchmark framework for POWER and x86_64
cuda benchmark sgemm microbenchmarks cpu-frequency-scaling cuda-sgemm numa memory performance
Language:Mathematica 7
fsword73 / SGEMM_on_VEGA
An alternative SGEMM implementation on AMD Vega Series
sgemm gpu
Language:Assembly 7
JunLee85 / ARM32-SGEMM-LIB
a fast sgemm lib with fix 16 enable on arm 32
arm32 neon fix16 cnn sgemm convolutional-codes
Language:C 3
XiaoSong9905 / cuda-v100-kernels
CUDA Kernels on V100
cuda hpc reduce gemm gpu scan sgemm transpose
Language:Cuda 3
aidevnn / CuPyFirstExample
CuPy first example computing GEMM with cuBlas, with handwritten cuda kernel and with NumPy-blas
python3 numpy cupy sgemm cublas openblas mkl jupyter-notebook
Language:Cuda

sgemm

Liu-xiandong / How_to_optimize_in_GPU

wangzyon / NVIDIA_SGEMM_PRACTICE

salykova / matmul.c

mz24cn / gemm_optimization

Stefan20162016 / maxas-explained

yui0 / ugemm

c3sr / scope

fsword73 / SGEMM_on_VEGA

JunLee85 / ARM32-SGEMM-LIB

XiaoSong9905 / cuda-v100-kernels

aidevnn / CuPyFirstExample