jundaf2 / Heterogeneous-GPUs

Heterogeneous Nvidia (CUDA) and Intel (OpenCL) GPU Programming

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Heterogeneous Nvidia (CUDA) and Intel (OpenCL) GPU Programming

An experiment of using multiple GPUs from different vendors with different programming launguages, taking vector addition as an example.

The results show that the hybrid GPU scheme that uses NV dedicated GPU and Intel integrated GPU together is the fastest compared with the cases that uses CPU and one single GPU.

Total Time Improve

Number of bytes in Giga: 1.44

CPU

  • Result on 11th Gen Intel(R) Core(TM) i9-11980HK @ 2.60GHz : 3.35544e+07
  • cpu_routine took 450 milliseconds

Intel Integrated GPU

  • opencl_routine_memcpy_h2d_time took 58895 microseconds
  • opencl_routine_kernel_time took 37442 microseconds
  • opencl_routine_memcpy_d2h_time took 27718 microseconds
  • Result on Intel(R) OpenCL HD Graphics : 3.35544e+07
  • opencl_routine took 530 milliseconds

Nvidia GPU

  • cuda_routine_preprocess_time took 828 microseconds
  • cuda_routine_memcpy_h2d_time took 80367 microseconds
  • cuda_routine_kernel_time took 4966 microseconds
  • cuda_routine_memcpy_d2h_time took 36614 microseconds
  • cuda_routine_postprocess_time took 5992 microseconds
  • Result on NVIDIA CUDA : 3.35544e+07
  • cuda_routine took 371 milliseconds

Hybrid GPU (Fastest!)

Methodology (Synchronization Diagram)

Log

  • heterogeneous_overlap_routine_preprocess_time took 54104 microseconds
  • heterogeneous_overlap_routine_memcpy_h2d_time took 55325 microseconds
  • heterogeneous_overlap_routine_kernel_time took 3067 microseconds
  • heterogeneous_overlap_routine_memcpy_d2h_time took 26689 microseconds
  • Result on Intel(R) OpenCL HD Graphics + NVIDIA CUDA : 3.35544e+07
  • heterogeneous_overlap_routine took 327 milliseconds

About

Heterogeneous Nvidia (CUDA) and Intel (OpenCL) GPU Programming


Languages

Language:C++ 89.7%Language:Cuda 3.9%Language:Makefile 2.8%Language:CMake 2.3%Language:C 1.4%