yui0 / ugemm

GEMM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ugemm

public domain Simple, Minimalistic, Fast GEMM library

How to build on macOS

$ make

How to build on Linux

# cat /etc/yum.repos.d/rocm.repo 
[ROCm]
name=ROCm
#baseurl=http://repo.radeon.com/rocm/yum/2.2/
baseurl=http://repo.radeon.com/rocm/yum/4.0/
enabled=1
gpgcheck=0

# dnf install opencl-headers mesa-libOpenCL ocl-icd-devel
# dnf install rocm-clang-ocl rocm-opencl rocm-opencl-devel rocm-utils
$ gcc -O3 sgemm_ocl.c -o sgemm_ocl -lOpenCL -lm

$ make

How to use

$ ./sgemm_ocl1
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.032 seconds per run, 62.9 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl2
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.016 seconds per run, 122.3 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl3
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.018 seconds per run, 112.6 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl4
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.015 seconds per run, 131.8 GFLOPS
0.000e+00/1.235e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.9929586175e+14 vs  -3.9929586175e+14 

$ ./sgemm_ocl6
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.012 seconds per run, 163.9 GFLOPS
0.000e+00/1.463e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.6711264766e+20 vs  -3.6711264766e+20 

$ ./sgemm-fast_ocl 
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 0/1, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.012 seconds per run, 162.1 GFLOPS
0.000e+00/1.463e+20=0.000e+00. 0.000e+00 at [  0,  0]  -3.6711264766e+20 vs  -3.6711264766e+20 

$ FORCE_CPU=1 ./sgemm_ocl
pthread-Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz (platform 0/2, device 0/1)
Maximum memory allocation size is 4294967296 bytes
>>> Done: took 0.108 seconds per run, 19.8 GFLOPS
0.000e+00/3.849e+21=0.000e+00. 0.000e+00 at [  0,  0]   2.3661284071e+18 vs   2.3661284071e+18 

$ ./sgemm_ocl -p 1
AMD Radeon HD 7800 Series (TAHITI, DRM 3.35.0, 5.4.6-berry, LLVM 9.0.0) (platform 1/2, device 0/2)
Maximum memory allocation size is 2576980377 bytes
>>> Done: took 0.015 seconds per run, 146.7 GFLOPS
0.000e+00/3.849e+21=0.000e+00. 0.000e+00 at [  0,  0]   2.3661284071e+18 vs   2.3661284071e+18 

Reference

About

GEMM


Languages

Language:C 90.2%Language:C++ 9.5%Language:Makefile 0.3%