ysh329 / OpenCL-101

Learn OpenCL step by step.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance loss caused by input type of kernel function

ysh329 opened this issue · comments

I found that it seems exits a performance loss using different type as input type for kernel function. Define two kernel functions, accomplishing same thing:

function1

__kernel void global_bandwidth_vec2(const int heightA, const int widthA, __global const float *a, __global float *b) {
    const int idx = get_global_id(0);
    const int step = idx << 1;

    CL_ELEM_TYPE value = *((__global CL_ELEM_TYPE *)(a + step));
    *((__global CL_ELEM_TYPE *)(b + step)) = value;
}

function2

__kernel void global_bandwidth_vec2(const int heightA, const int widthA, __global const CL_ELEM_TYPE *a, __global CL_ELEM_TYPE *b) {
    const int idx = get_global_id(0);
    const int step = idx << 1;

    CL_ELEM_TYPE value = *((__global CL_ELEM_TYPE *)(a + step));
    *((__global CL_ELEM_TYPE *)(b + step)) = value;
}

These two functions are same except input-variable type: function1 using float; function2 using MACRO (defined in clBuildProgram) instead.

I guess this performance loss is due to difference between oldCL/cl.h and now ones.

Temporarily closed.