ysh329 / OpenCL-101

Learn OpenCL step by step.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bandwidth] Bandwidth for typeN and compare with clpeak result

ysh329 opened this issue · comments

Before, set max freq. for gpu and cpu using scrips in tools of this repo.

  1. Calculate bandwidth for typeN: intN, floatN, halfN;
  2. Compare with clpeak result.

clpeak:

Platform: ARM Platform
  Device: Mali-T860
    Driver version  : 1.2 (Linux ARM64)
    Compute units   : 4
    Clock frequency : 800 MHz

    Global memory bandwidth (GBPS)
      float   : 3.84
      float2  : 6.00
      float4  : 7.33
      float8  : 6.01
      float16 : 5.78

    Single-precision compute (GFLOPS)
      float   : 22.86
      float2  : 44.68
      float4  : 44.51
      float8  : 41.46
      float16 : 46.16

    half-precision compute (GFLOPS)
      half   : 22.83
      half2  : 46.46
      half4  : 93.96
      half8  : 92.44
      half16 : 69.40

    Double-precision compute (GFLOPS)
      double   : 3.60
      double2  : 3.54
      double4  : 20.92
      double8  : 20.60
      double16 : 20.35

    Integer compute (GIOPS)
      int   : 20.26
      int2  : 49.72
      int4  : 47.51
      int8  : 48.96
      int16 : 41.47

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 4.06
      enqueueReadBuffer          : 2.17
      enqueueMapBuffer(for read) : 2015.28
        memcpy from mapped ptr   : 2.18
      enqueueUnmap(after write)  : 5406.56
        memcpy to mapped ptr     : 2.23

    Kernel launch latency : 78.36 us

My bandwidth results are as below (more concrete logs're here):

half1: 5.16 GB/s
half2: 4.71 GB/s
half4: 5.14 GB/s
half8: 5.50 GB/s
half16: 4.98 GB/s
half1-A53: 2.10 GB/s
half1-A72: 3.91 GB/s

short1: 5.29 GB/s
short2: 4.71 GB/s
short4: 5.07 GB/s
short8: 5.52 GB/s
short16: 5.00GB/s
short1-A53: 2.26 GB/s
short1-A72: 4.51 GB/s

int1: 5.26 GB/s
int2: 5.49 GB/s
int4: 6.13 GB/s
int8: 5.49 GB/s
int16: 5.28 GB/s
int-a53: 2.25 GB/s
int-a72: 4.53 GB/s

float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s

double1: 4.49 GB/s
double2: 6.39 GB/s
double4: 5.58 GB/s
double8: 5.40 GB/s
double16: 5.51 GB/s
double1-A53: 2.29 GB/s
double1-A72: 4.58 GB/s

The gap between clpeak (bandwidth is bigger than measures using my code) and my bandwidth is due to read operation only for clpeak, but my bandwidth have both read and write operations in kernel function.

clpeak

Kerel function is here.

    Global memory bandwidth (GBPS)
      float   : 3.84
      float2  : 6.00
      float4  : 7.33
      float8  : 6.01
      float16 : 5.78

my bandwidth

Kernel function is here.

float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s