[bandwidth] Bandwidth for typeN and compare with clpeak result
ysh329 opened this issue · comments
Before, set max freq. for gpu and cpu using scrips in tools of this repo.
- Calculate bandwidth for typeN: intN, floatN, halfN;
- Compare with clpeak result.
clpeak:
Platform: ARM Platform
Device: Mali-T860
Driver version : 1.2 (Linux ARM64)
Compute units : 4
Clock frequency : 800 MHz
Global memory bandwidth (GBPS)
float : 3.84
float2 : 6.00
float4 : 7.33
float8 : 6.01
float16 : 5.78
Single-precision compute (GFLOPS)
float : 22.86
float2 : 44.68
float4 : 44.51
float8 : 41.46
float16 : 46.16
half-precision compute (GFLOPS)
half : 22.83
half2 : 46.46
half4 : 93.96
half8 : 92.44
half16 : 69.40
Double-precision compute (GFLOPS)
double : 3.60
double2 : 3.54
double4 : 20.92
double8 : 20.60
double16 : 20.35
Integer compute (GIOPS)
int : 20.26
int2 : 49.72
int4 : 47.51
int8 : 48.96
int16 : 41.47
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 4.06
enqueueReadBuffer : 2.17
enqueueMapBuffer(for read) : 2015.28
memcpy from mapped ptr : 2.18
enqueueUnmap(after write) : 5406.56
memcpy to mapped ptr : 2.23
Kernel launch latency : 78.36 us
My bandwidth results are as below (more concrete logs're here):
half1: 5.16 GB/s
half2: 4.71 GB/s
half4: 5.14 GB/s
half8: 5.50 GB/s
half16: 4.98 GB/s
half1-A53: 2.10 GB/s
half1-A72: 3.91 GB/s
short1: 5.29 GB/s
short2: 4.71 GB/s
short4: 5.07 GB/s
short8: 5.52 GB/s
short16: 5.00GB/s
short1-A53: 2.26 GB/s
short1-A72: 4.51 GB/s
int1: 5.26 GB/s
int2: 5.49 GB/s
int4: 6.13 GB/s
int8: 5.49 GB/s
int16: 5.28 GB/s
int-a53: 2.25 GB/s
int-a72: 4.53 GB/s
float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s
double1: 4.49 GB/s
double2: 6.39 GB/s
double4: 5.58 GB/s
double8: 5.40 GB/s
double16: 5.51 GB/s
double1-A53: 2.29 GB/s
double1-A72: 4.58 GB/s
The gap between clpeak (bandwidth is bigger than measures using my code) and my bandwidth is due to read operation only for clpeak, but my bandwidth have both read and write operations in kernel function.
clpeak
Kerel function is here.
Global memory bandwidth (GBPS)
float : 3.84
float2 : 6.00
float4 : 7.33
float8 : 6.01
float16 : 5.78
my bandwidth
Kernel function is here.
float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s