wjc404

wjc404

User data from Github https://github.com/wjc404

Location:Beijing, China

GitHub:@wjc404

wjc404's repositories

GEMM_AVX512F

SGEMM and DGEMM subroutines using AVX512F instructions.

Language:CLicense:GPL-3.0Stargazers:14Issues:1Issues:1

GEMM_AVX2

Fast avx2/fma3 dgemm and sgemm subroutines for medium to large matrices(>2000*2000) on haswell/skylake/zen processors, with performances comparable to MKL.

Language:CLicense:GPL-3.0Stargazers:7Issues:2Issues:0

Simple_CUDA_GEMM

Sgemm kernel function on Nvidia Pascal GPU, able to achieve 60% theoretical performance.

Language:CudaLicense:GPL-3.0Stargazers:5Issues:1Issues:0

GEMM_AVX2_FMA3

sgemm and dgemm subroutine for large matrices, slightly outperform Intel MKL

Language:CLicense:GPL-3.0Stargazers:1Issues:0Issues:0

bitonic_fp32_avx_top16

Topk with K = 16 or 32, based on bitonic sort algorithm, using Intel AVX instructions.

Language:C++License:MITStargazers:0Issues:1Issues:0

COMPLEX_GEMM_AVX2_FMA3

cgemm and zgemm subroutines for large matrices, using avx2 and fma3 instructions, with performance comparable to MKL2018

Language:CLicense:GPL-3.0Stargazers:0Issues:0Issues:0

cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

Language:C++License:MITStargazers:0Issues:0Issues:0

GEMM3M_AVX2_FMA3

cgemm3m and zgemm3m subroutines for large matrices, using AVX2 and FMA3 instructions.

Language:CLicense:GPL-3.0Stargazers:0Issues:0Issues:0

OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Language:FortranLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0
License:GPL-3.0Stargazers:0Issues:0Issues:0