Support halfN precision for GPU and CPU
ysh329 opened this issue · comments
- GPU cl_khr_fp16: correct rate is wrong for HalfN when N is bigger than 1;
- CPU fp16: segmentation fault when bigger than 128*128, such as 256*256.
Besides, about data_size
variable, should I define data_size
variables respectly for CPU and GPU? if using same data_size
variable for different CPU-type or GPU-type (such as float
cpu , half
gpu), does it cause error?
Note:
- half type of OCL on host: cl_half, don't support cl_halfN;
- half type of OCL on device: half or halfN, don't support cl_half or cl_halfN.
Besides, when using half
type in device, please ensure your host use half
type (such as __fp16
, so as to keep data_size
of result variable from cpu and gpu are matched) too!