fsword73 / CNNKernelPerfTest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CNNKernelPerfTest

Aimed to promote performance of OpenCL kernel of Convolutional Neural Network. The simple APP is defined as

  • 4 buffers with initiazed with Chess Board
  • 6 Input Buffer and 6 constant integers
  • the 6th buffer will be saved as PNG
  • The kernel is defined as standard name and format

void kernel test_kernel(__constant float* filter, __global float* dataBuf1, __global float* dataBuf2, __global float* dataBuf3, __global float* dataBuf4, __global float* dataBuf5, __global float* dataBuf6, const int const1,
const int const2,
const int const3,
const int const4,
const int const5,
const int const6)
{ int id = get_global_id(0);

int imageSize = const1;
int filterSize = const2;
int imageOffset = id / (imageSize * imageSize) * (imageSize * imageSize);
int localid = id % (imageSize * imageSize);
int row = localid / imageSize;
int col = localid % imageSize;
int halfFilterSize = filterSize >> 1;
float sum = 0;
int minm = max(-halfFilterSize, -row);
int maxm = min(halfFilterSize, imageSize - 1 - row);
int minn = max(-halfFilterSize, -col);
int maxn = min(halfFilterSize, imageSize - 1 - col);
int m = minm;	
	
while(m <= maxm) {
	int x = (row + m);
	int ximage = imageOffset + x * imageSize;
	int filterrowoffset = (m+halfFilterSize) * filterSize + halfFilterSize;
	int n = minn;
	while(n <= maxn) {
		int y = col + n;
		sum += dataBuf1[ ximage + y] * filter[ filterrowoffset + n ];
		n++;
	}
	m++;
}
dataBuf6[id]  = sum;

}

The example command line will be CNNBench.exe -dim 1 -gx 4194304 -gy 1 -lx 64 -ly 1 -f 3 -c1 2048 -c2 3

-dim 1 : 1D buffer

-gx 4194304 : globalthreads_x 2048x2048 image

-gy 1 : globalthreads_y

-lx 64 : localthreads_x

-ly 1 : localthreads_y

-f 3 : filterSize;

-c1 2048 : constant1, iamgeSize

-c2 3 : constant2, fitlerSize

About


Languages

Language:C++ 58.2%Language:C 41.1%Language:Batchfile 0.6%