nbren12 / homework4

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Homework 4

Download and convert image

This script download and converts the image from Interface Lift.

if [[ ! -e ./03884_hooverdamaerial_1280x800.jpg ]]
then
    curl -O http://interfacelift.com/wallpaper/Df0b3624/03884_hooverdamaerial_1280x800.jpg
fi
convert -compress none 03884_hooverdamaerial_1280x800.jpg dam.ppm

Here is the image:

dam.png

Here is the convolved version obatined by running:

./convert dam.ppm 100

output.png

Apart from being black and white, it is blurry as expected.

Output

Laptop

☁  hw4  ./convolution dam.ppm 100
Reading ``dam.ppm''
Done reading ``dam.ppm'' of size 1280x800
Writing cpu filtered image
Choose platform:
[0] Apple
Enter choice: 0
Choose device:
[0] Intel(R) Core(TM) i5-4308U CPU @ 2.80GHz
[1] Iris
Enter choice: 1
---------------------------------------------------------------------
NAME: Iris
VENDOR: Intel
PROFILE: FULL_PROFILE
VERSION: OpenCL 1.2
EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images cl_khr_3d_image_writes
DRIVER_VERSION: 1.2(Dec 23 2014 00:18:30)

Type: GPU
EXECUTION_CAPABILITIES: Kernel
GLOBAL_MEM_CACHE_TYPE: None (0)
CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
SINGLE_FP_CONFIG: 0xbe
QUEUE_PROPERTIES: 0x2

VENDOR_ID: 16925952
MAX_COMPUTE_UNITS: 40
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_GROUP_SIZE: 512
PREFERRED_VECTOR_WIDTH_CHAR: 1
PREFERRED_VECTOR_WIDTH_SHORT: 1
PREFERRED_VECTOR_WIDTH_INT: 1
PREFERRED_VECTOR_WIDTH_LONG: 1
PREFERRED_VECTOR_WIDTH_FLOAT: 1
PREFERRED_VECTOR_WIDTH_DOUBLE: 0
MAX_CLOCK_FREQUENCY: 1200
ADDRESS_BITS: 64
MAX_MEM_ALLOC_SIZE: 402653184
IMAGE_SUPPORT: 1
MAX_READ_IMAGE_ARGS: 128
MAX_WRITE_IMAGE_ARGS: 8
IMAGE2D_MAX_WIDTH: 16384
IMAGE2D_MAX_HEIGHT: 16384
IMAGE3D_MAX_WIDTH: 2048
IMAGE3D_MAX_HEIGHT: 2048
IMAGE3D_MAX_DEPTH: 2048
MAX_SAMPLERS: 16
MAX_PARAMETER_SIZE: 1024
MEM_BASE_ADDR_ALIGN: 1024
MIN_DATA_TYPE_ALIGN_SIZE: 128
GLOBAL_MEM_CACHELINE_SIZE: 0
GLOBAL_MEM_CACHE_SIZE: 0
GLOBAL_MEM_SIZE: 1610612736
MAX_CONSTANT_BUFFER_SIZE: 65536
MAX_CONSTANT_ARGS: 8
LOCAL_MEM_SIZE: 65536
ERROR_CORRECTION_SUPPORT: 0
PROFILING_TIMER_RESOLUTION: 80
ENDIAN_LITTLE: 1
AVAILABLE: 1
COMPILER_AVAILABLE: 1
MAX_WORK_GROUP_SIZES: 512 512 512
---------------------------------------------------------------------
Info for kernel convolution:
  CL_KERNEL_WORK_GROUP_SIZE=512
  CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE=16
  CL_KERNEL_LOCAL_MEM_SIZE=0
  CL_KERNEL_PRIVATE_MEM_SIZE=0
0.004582 s
223.467588 MPixels/s
1.787741 GBit/s
10.883282 GFlop/s
Writing OpenCL filtered image

CIMS

opencl1 intel xeon

☁  homework4 [master] ./convolution dam.ppm 100
Reading ``dam.ppm''
Done reading ``dam.ppm'' of size 1280x800
Writing cpu filtered image
Choose platform:
[0] Advanced Micro Devices, Inc.
[1] Intel(R) Corporation
Enter choice: 1
Choose device:
[0] Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
Enter choice: 0
---------------------------------------------------------------------
NAME: Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
VENDOR: Intel(R) Corporation
PROFILE: FULL_PROFILE
VERSION: OpenCL 1.1 (Build 31360.31426)
EXTENSIONS: cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread
DRIVER_VERSION: 1.1

Type: CPU
EXECUTION_CAPABILITIES: Kernel Native
GLOBAL_MEM_CACHE_TYPE: Read-Write (2)
CL_DEVICE_LOCAL_MEM_TYPE: Global (2)
SINGLE_FP_CONFIG: 0x7
QUEUE_PROPERTIES: 0x3

VENDOR_ID: 32902
MAX_COMPUTE_UNITS: 4
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_GROUP_SIZE: 1024
PREFERRED_VECTOR_WIDTH_CHAR: 16
PREFERRED_VECTOR_WIDTH_SHORT: 8
PREFERRED_VECTOR_WIDTH_INT: 4
PREFERRED_VECTOR_WIDTH_LONG: 2
PREFERRED_VECTOR_WIDTH_FLOAT: 4
PREFERRED_VECTOR_WIDTH_DOUBLE: 2
MAX_CLOCK_FREQUENCY: 2130
ADDRESS_BITS: 64
MAX_MEM_ALLOC_SIZE: 1531762688
IMAGE_SUPPORT: 1
MAX_READ_IMAGE_ARGS: 480
MAX_WRITE_IMAGE_ARGS: 480
IMAGE2D_MAX_WIDTH: 8192
IMAGE2D_MAX_HEIGHT: 8192
IMAGE3D_MAX_WIDTH: 2048
IMAGE3D_MAX_HEIGHT: 2048
IMAGE3D_MAX_DEPTH: 2048
MAX_SAMPLERS: 480
MAX_PARAMETER_SIZE: 3840
MEM_BASE_ADDR_ALIGN: 1024
MIN_DATA_TYPE_ALIGN_SIZE: 128
GLOBAL_MEM_CACHELINE_SIZE: 64
GLOBAL_MEM_CACHE_SIZE: 262144
GLOBAL_MEM_SIZE: 6127050752
MAX_CONSTANT_BUFFER_SIZE: 131072
MAX_CONSTANT_ARGS: 480
LOCAL_MEM_SIZE: 32768
ERROR_CORRECTION_SUPPORT: 0
PROFILING_TIMER_RESOLUTION: 1
ENDIAN_LITTLE: 1
AVAILABLE: 1
COMPILER_AVAILABLE: 1
MAX_WORK_GROUP_SIZES: 1024 1024 1024
---------------------------------------------------------------------
*** Kernel compilation resulted in non-empty log message.
*** Set environment variable CL_HELPER_PRINT_COMPILER_OUTPUT=1 to see more.
*** NOTE: this may include compiler warnings and other important messages
***   about your code.
*** Set CL_HELPER_NO_COMPILER_OUTPUT_NAG=1 to disable this message.
Info for kernel convolution:
  CL_KERNEL_WORK_GROUP_SIZE=1024
  CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE=128
  CL_KERNEL_LOCAL_MEM_SIZE=0
  CL_KERNEL_PRIVATE_MEM_SIZE=128
0.032563 s
31.446522 MPixels/s
0.251572 GBit/s
1.531503 GFlop/s
Writing OpenCL filtered image

opencl1 amd cypress

☁  homework4 [master] ./convolution dam.ppm 100
Reading ``dam.ppm''
Done reading ``dam.ppm'' of size 1280x800
Writing cpu filtered image
Choose platform:
[0] Advanced Micro Devices, Inc.
[1] Intel(R) Corporation
Enter choice: 0
Choose device:
[0] Cypress
[1] Cypress
[2] Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
Enter choice: 0
---------------------------------------------------------------------
NAME: Cypress
VENDOR: Advanced Micro Devices, Inc.
PROFILE: FULL_PROFILE
VERSION: OpenCL 1.2 AMD-APP (1113.2)
EXTENSIONS: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
DRIVER_VERSION: 1113.2

Type: GPU
EXECUTION_CAPABILITIES: Kernel
GLOBAL_MEM_CACHE_TYPE: None (0)
CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
SINGLE_FP_CONFIG: 0xbe
QUEUE_PROPERTIES: 0x2

VENDOR_ID: 4098
MAX_COMPUTE_UNITS: 18
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_GROUP_SIZE: 256
PREFERRED_VECTOR_WIDTH_CHAR: 16
PREFERRED_VECTOR_WIDTH_SHORT: 8
PREFERRED_VECTOR_WIDTH_INT: 4
PREFERRED_VECTOR_WIDTH_LONG: 2
PREFERRED_VECTOR_WIDTH_FLOAT: 4
PREFERRED_VECTOR_WIDTH_DOUBLE: 2
MAX_CLOCK_FREQUENCY: 700
ADDRESS_BITS: 32
MAX_MEM_ALLOC_SIZE: 268435456
IMAGE_SUPPORT: 1
MAX_READ_IMAGE_ARGS: 128
MAX_WRITE_IMAGE_ARGS: 8
IMAGE2D_MAX_WIDTH: 16384
IMAGE2D_MAX_HEIGHT: 16384
IMAGE3D_MAX_WIDTH: 2048
IMAGE3D_MAX_HEIGHT: 2048
IMAGE3D_MAX_DEPTH: 2048
MAX_SAMPLERS: 16
MAX_PARAMETER_SIZE: 1024
MEM_BASE_ADDR_ALIGN: 2048
MIN_DATA_TYPE_ALIGN_SIZE: 128
GLOBAL_MEM_CACHELINE_SIZE: 0
GLOBAL_MEM_CACHE_SIZE: 0
GLOBAL_MEM_SIZE: 1073741824
MAX_CONSTANT_BUFFER_SIZE: 65536
MAX_CONSTANT_ARGS: 8
LOCAL_MEM_SIZE: 32768
ERROR_CORRECTION_SUPPORT: 0
PROFILING_TIMER_RESOLUTION: 1
ENDIAN_LITTLE: 1
AVAILABLE: 1
COMPILER_AVAILABLE: 1
MAX_WORK_GROUP_SIZES: 256 256 256
---------------------------------------------------------------------
Info for kernel convolution:
  CL_KERNEL_WORK_GROUP_SIZE=256
  CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE=64
  CL_KERNEL_LOCAL_MEM_SIZE=1936
  CL_KERNEL_PRIVATE_MEM_SIZE=0
0.003614 s
283.354865 MPixels/s
2.266839 GBit/s
13.799902 GFlop/s
Writing OpenCL filtered image

Timings for 16x16 work group

The command run is

./convolution dam.ppm 100

The slowest is the CPU on the opencl cims node. The laptop and opencl1 GPUs are faster.

Laptop

0.004582 s
223.467588 MPixels/s
1.787741 GBit/s
10.883282 GFlop/s
Writing OpenCL filtered image

CIMS

opencl1 intel xeon

0.032563 s
31.446522 MPixels/s
0.251572 GBit/s
1.531503 GFlop/s
Writing OpenCL filtered image

opencl1 amd cypress

0.003614 s
283.354865 MPixels/s
2.266839 GBit/s
13.799902 GFlop/s
Writing OpenCL filtered image

Changing work group size

These tests are run on my laptop. The 16x16 workgroup size seems nearly optimal. The command run in all instances is

./convolution dam.ppm 100

4x4 work group

0.012927 s
79.214661 MPixels/s
0.633717 GBit/s
3.857899 GFlop/s

8x8 work group

0.004665 s
219.497086 MPixels/s
1.755977 GBit/s
10.689911 GFlop/s

16x16

0.004486 s
228.257574 MPixels/s
1.826061 GBit/s
11.116563 GFlop/s

20x20

0.005344 s
191.602783 MPixels/s
1.532822 GBit/s
9.331408 GFlop/s

32x16

0.005218 s
196.250541 MPixels/s
1.570004 GBit/s
9.557762 GFlop/s

32x32 work group

It doesn’t run returning the following error:

*** 'clEnqueueNDRangeKernel(queue, knl, 2, NULL, global_size, local_size, 0, NULL, NULL)' in 'convolution.c' on line 263 failed with error 'invalid work group size'.

Iterative Blurring

See the code for the implementation details. I did not account for the edge effects unfortunately.

About


Languages

Language:C 99.1%Language:Makefile 0.9%