This script download and converts the image from Interface Lift.
if [[ ! -e ./03884_hooverdamaerial_1280x800.jpg ]]
then
curl -O http://interfacelift.com/wallpaper/Df0b3624/03884_hooverdamaerial_1280x800.jpg
fi
convert -compress none 03884_hooverdamaerial_1280x800.jpg dam.ppm
Here is the image:
Here is the convolved version obatined by running:
./convert dam.ppm 100
Apart from being black and white, it is blurry as expected.
☁ hw4 ./convolution dam.ppm 100 Reading ``dam.ppm'' Done reading ``dam.ppm'' of size 1280x800 Writing cpu filtered image Choose platform: [0] Apple Enter choice: 0 Choose device: [0] Intel(R) Core(TM) i5-4308U CPU @ 2.80GHz [1] Iris Enter choice: 1 --------------------------------------------------------------------- NAME: Iris VENDOR: Intel PROFILE: FULL_PROFILE VERSION: OpenCL 1.2 EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images cl_khr_3d_image_writes DRIVER_VERSION: 1.2(Dec 23 2014 00:18:30) Type: GPU EXECUTION_CAPABILITIES: Kernel GLOBAL_MEM_CACHE_TYPE: None (0) CL_DEVICE_LOCAL_MEM_TYPE: Local (1) SINGLE_FP_CONFIG: 0xbe QUEUE_PROPERTIES: 0x2 VENDOR_ID: 16925952 MAX_COMPUTE_UNITS: 40 MAX_WORK_ITEM_DIMENSIONS: 3 MAX_WORK_GROUP_SIZE: 512 PREFERRED_VECTOR_WIDTH_CHAR: 1 PREFERRED_VECTOR_WIDTH_SHORT: 1 PREFERRED_VECTOR_WIDTH_INT: 1 PREFERRED_VECTOR_WIDTH_LONG: 1 PREFERRED_VECTOR_WIDTH_FLOAT: 1 PREFERRED_VECTOR_WIDTH_DOUBLE: 0 MAX_CLOCK_FREQUENCY: 1200 ADDRESS_BITS: 64 MAX_MEM_ALLOC_SIZE: 402653184 IMAGE_SUPPORT: 1 MAX_READ_IMAGE_ARGS: 128 MAX_WRITE_IMAGE_ARGS: 8 IMAGE2D_MAX_WIDTH: 16384 IMAGE2D_MAX_HEIGHT: 16384 IMAGE3D_MAX_WIDTH: 2048 IMAGE3D_MAX_HEIGHT: 2048 IMAGE3D_MAX_DEPTH: 2048 MAX_SAMPLERS: 16 MAX_PARAMETER_SIZE: 1024 MEM_BASE_ADDR_ALIGN: 1024 MIN_DATA_TYPE_ALIGN_SIZE: 128 GLOBAL_MEM_CACHELINE_SIZE: 0 GLOBAL_MEM_CACHE_SIZE: 0 GLOBAL_MEM_SIZE: 1610612736 MAX_CONSTANT_BUFFER_SIZE: 65536 MAX_CONSTANT_ARGS: 8 LOCAL_MEM_SIZE: 65536 ERROR_CORRECTION_SUPPORT: 0 PROFILING_TIMER_RESOLUTION: 80 ENDIAN_LITTLE: 1 AVAILABLE: 1 COMPILER_AVAILABLE: 1 MAX_WORK_GROUP_SIZES: 512 512 512 --------------------------------------------------------------------- Info for kernel convolution: CL_KERNEL_WORK_GROUP_SIZE=512 CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE=16 CL_KERNEL_LOCAL_MEM_SIZE=0 CL_KERNEL_PRIVATE_MEM_SIZE=0 0.004582 s 223.467588 MPixels/s 1.787741 GBit/s 10.883282 GFlop/s Writing OpenCL filtered image
☁ homework4 [master] ./convolution dam.ppm 100
Reading ``dam.ppm''
Done reading ``dam.ppm'' of size 1280x800
Writing cpu filtered image
Choose platform:
[0] Advanced Micro Devices, Inc.
[1] Intel(R) Corporation
Enter choice: 1
Choose device:
[0] Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
Enter choice: 0
---------------------------------------------------------------------
NAME: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
VENDOR: Intel(R) Corporation
PROFILE: FULL_PROFILE
VERSION: OpenCL 1.1 (Build 31360.31426)
EXTENSIONS: cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread
DRIVER_VERSION: 1.1
Type: CPU
EXECUTION_CAPABILITIES: Kernel Native
GLOBAL_MEM_CACHE_TYPE: Read-Write (2)
CL_DEVICE_LOCAL_MEM_TYPE: Global (2)
SINGLE_FP_CONFIG: 0x7
QUEUE_PROPERTIES: 0x3
VENDOR_ID: 32902
MAX_COMPUTE_UNITS: 4
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_GROUP_SIZE: 1024
PREFERRED_VECTOR_WIDTH_CHAR: 16
PREFERRED_VECTOR_WIDTH_SHORT: 8
PREFERRED_VECTOR_WIDTH_INT: 4
PREFERRED_VECTOR_WIDTH_LONG: 2
PREFERRED_VECTOR_WIDTH_FLOAT: 4
PREFERRED_VECTOR_WIDTH_DOUBLE: 2
MAX_CLOCK_FREQUENCY: 2130
ADDRESS_BITS: 64
MAX_MEM_ALLOC_SIZE: 1531762688
IMAGE_SUPPORT: 1
MAX_READ_IMAGE_ARGS: 480
MAX_WRITE_IMAGE_ARGS: 480
IMAGE2D_MAX_WIDTH: 8192
IMAGE2D_MAX_HEIGHT: 8192
IMAGE3D_MAX_WIDTH: 2048
IMAGE3D_MAX_HEIGHT: 2048
IMAGE3D_MAX_DEPTH: 2048
MAX_SAMPLERS: 480
MAX_PARAMETER_SIZE: 3840
MEM_BASE_ADDR_ALIGN: 1024
MIN_DATA_TYPE_ALIGN_SIZE: 128
GLOBAL_MEM_CACHELINE_SIZE: 64
GLOBAL_MEM_CACHE_SIZE: 262144
GLOBAL_MEM_SIZE: 6127050752
MAX_CONSTANT_BUFFER_SIZE: 131072
MAX_CONSTANT_ARGS: 480
LOCAL_MEM_SIZE: 32768
ERROR_CORRECTION_SUPPORT: 0
PROFILING_TIMER_RESOLUTION: 1
ENDIAN_LITTLE: 1
AVAILABLE: 1
COMPILER_AVAILABLE: 1
MAX_WORK_GROUP_SIZES: 1024 1024 1024
---------------------------------------------------------------------
*** Kernel compilation resulted in non-empty log message.
*** Set environment variable CL_HELPER_PRINT_COMPILER_OUTPUT=1 to see more.
*** NOTE: this may include compiler warnings and other important messages
*** about your code.
*** Set CL_HELPER_NO_COMPILER_OUTPUT_NAG=1 to disable this message.
Info for kernel convolution:
CL_KERNEL_WORK_GROUP_SIZE=1024
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE=128
CL_KERNEL_LOCAL_MEM_SIZE=0
CL_KERNEL_PRIVATE_MEM_SIZE=128
0.032563 s
31.446522 MPixels/s
0.251572 GBit/s
1.531503 GFlop/s
Writing OpenCL filtered image
☁ homework4 [master] ./convolution dam.ppm 100 Reading ``dam.ppm'' Done reading ``dam.ppm'' of size 1280x800 Writing cpu filtered image Choose platform: [0] Advanced Micro Devices, Inc. [1] Intel(R) Corporation Enter choice: 0 Choose device: [0] Cypress [1] Cypress [2] Intel(R) Xeon(R) CPU E5506 @ 2.13GHz Enter choice: 0 --------------------------------------------------------------------- NAME: Cypress VENDOR: Advanced Micro Devices, Inc. PROFILE: FULL_PROFILE VERSION: OpenCL 1.2 AMD-APP (1113.2) EXTENSIONS: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt DRIVER_VERSION: 1113.2 Type: GPU EXECUTION_CAPABILITIES: Kernel GLOBAL_MEM_CACHE_TYPE: None (0) CL_DEVICE_LOCAL_MEM_TYPE: Local (1) SINGLE_FP_CONFIG: 0xbe QUEUE_PROPERTIES: 0x2 VENDOR_ID: 4098 MAX_COMPUTE_UNITS: 18 MAX_WORK_ITEM_DIMENSIONS: 3 MAX_WORK_GROUP_SIZE: 256 PREFERRED_VECTOR_WIDTH_CHAR: 16 PREFERRED_VECTOR_WIDTH_SHORT: 8 PREFERRED_VECTOR_WIDTH_INT: 4 PREFERRED_VECTOR_WIDTH_LONG: 2 PREFERRED_VECTOR_WIDTH_FLOAT: 4 PREFERRED_VECTOR_WIDTH_DOUBLE: 2 MAX_CLOCK_FREQUENCY: 700 ADDRESS_BITS: 32 MAX_MEM_ALLOC_SIZE: 268435456 IMAGE_SUPPORT: 1 MAX_READ_IMAGE_ARGS: 128 MAX_WRITE_IMAGE_ARGS: 8 IMAGE2D_MAX_WIDTH: 16384 IMAGE2D_MAX_HEIGHT: 16384 IMAGE3D_MAX_WIDTH: 2048 IMAGE3D_MAX_HEIGHT: 2048 IMAGE3D_MAX_DEPTH: 2048 MAX_SAMPLERS: 16 MAX_PARAMETER_SIZE: 1024 MEM_BASE_ADDR_ALIGN: 2048 MIN_DATA_TYPE_ALIGN_SIZE: 128 GLOBAL_MEM_CACHELINE_SIZE: 0 GLOBAL_MEM_CACHE_SIZE: 0 GLOBAL_MEM_SIZE: 1073741824 MAX_CONSTANT_BUFFER_SIZE: 65536 MAX_CONSTANT_ARGS: 8 LOCAL_MEM_SIZE: 32768 ERROR_CORRECTION_SUPPORT: 0 PROFILING_TIMER_RESOLUTION: 1 ENDIAN_LITTLE: 1 AVAILABLE: 1 COMPILER_AVAILABLE: 1 MAX_WORK_GROUP_SIZES: 256 256 256 --------------------------------------------------------------------- Info for kernel convolution: CL_KERNEL_WORK_GROUP_SIZE=256 CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE=64 CL_KERNEL_LOCAL_MEM_SIZE=1936 CL_KERNEL_PRIVATE_MEM_SIZE=0 0.003614 s 283.354865 MPixels/s 2.266839 GBit/s 13.799902 GFlop/s Writing OpenCL filtered image
The command run is
./convolution dam.ppm 100
The slowest is the CPU on the opencl cims node. The laptop and opencl1 GPUs are faster.
0.004582 s 223.467588 MPixels/s 1.787741 GBit/s 10.883282 GFlop/s Writing OpenCL filtered image
0.032563 s
31.446522 MPixels/s
0.251572 GBit/s
1.531503 GFlop/s
Writing OpenCL filtered image
0.003614 s 283.354865 MPixels/s 2.266839 GBit/s 13.799902 GFlop/s Writing OpenCL filtered image
These tests are run on my laptop. The 16x16 workgroup size seems nearly optimal. The command run in all instances is
./convolution dam.ppm 100
0.012927 s
79.214661 MPixels/s
0.633717 GBit/s
3.857899 GFlop/s
0.004665 s
219.497086 MPixels/s
1.755977 GBit/s
10.689911 GFlop/s
0.004486 s
228.257574 MPixels/s
1.826061 GBit/s
11.116563 GFlop/s
0.005344 s
191.602783 MPixels/s
1.532822 GBit/s
9.331408 GFlop/s
0.005218 s
196.250541 MPixels/s
1.570004 GBit/s
9.557762 GFlop/s
It doesn’t run returning the following error:
*** 'clEnqueueNDRangeKernel(queue, knl, 2, NULL, global_size, local_size, 0, NULL, NULL)' in 'convolution.c' on line 263 failed with error 'invalid work group size'.
See the code for the implementation details. I did not account for the edge effects unfortunately.