Compilining and running SGMM freezes

Question

Compilining and running SGMM freezes

naturalmechanics opened this issue a year ago · comments

Hello I am trying to compile this code:

// =================================================================================================
// This file is part of the CLBlast project. The project is licensed under Apache Version 2.0. This
// project loosely follows the Google C++ styleguide and uses a tab-size of two spaces and a max-
// width of 100 characters per line.
//
// Author(s):
//   Cedric Nugteren <www.cedricnugteren.nl>
//
// This file demonstrates the use of the SGEMM routine. It is a stand-alone example, but it does
// require the Khronos C++ OpenCL API header file (downloaded by CMake). The example uses C++
// features, but CLBlast can also be used using the regular C-style OpenCL API.
//
// Note that this example is meant for illustration purposes only. CLBlast provides other programs
// for performance benchmarking ('client_xxxxx') and for correctness testing ('test_xxxxx').
//
// =================================================================================================

#include <cstdio>
#include <iostream>
#include <chrono>
#include <vector>

#define CL_USE_DEPRECATED_OPENCL_1_1_APIS // to disable deprecation warnings
#define CL_USE_DEPRECATED_OPENCL_1_2_APIS // to disable deprecation warnings

// Includes the C++ OpenCL API. If not yet available, it can be found here:
// https://raw.githubusercontent.com/KhronosGroup/OpenCL-CLHPP/main/include/CL/opencl.hpp
#define CL_HPP_TARGET_OPENCL_VERSION 120
#define CL_HPP_MINIMUM_OPENCL_VERSION 120
#define CL_TARGET_OPENCL_VERSION 120
#include "/usr/include/CL/opencl.hpp"

// Includes the CLBlast library
#include <clblast.h>

// =================================================================================================

// Example use of the single-precision Xgemm routine SGEMM
int main() {

  // OpenCL platform/device settings
  const auto platform_id = 0;
  const auto device_id = 0;

  // Example SGEMM arguments
  const size_t m = 4;
  const size_t n = 2;
  const size_t k = 8;
  const float alpha = 0.7f;
  const float beta = 1.0f;
  const auto a_ld = k;
  const auto b_ld = n;
  const auto c_ld = n;



  // Initializes the OpenCL platform
  auto platforms = std::vector<cl::Platform>();
  std::cout << "here40" << std::endl;
  cl::Platform::get(&platforms);
  std::cout << "here41" << std::endl;
  if (platforms.size() == 0 || platform_id >= platforms.size()) { return 1; }
  std::cout << "here42" << std::endl;
  auto platform = platforms[platform_id];

  // Initializes the OpenCL device
  auto devices = std::vector<cl::Device>();
  platform.getDevices(CL_DEVICE_TYPE_ALL, &devices);
  if (devices.size() == 0 || device_id >= devices.size()) { return 1; }
  auto device = devices[device_id];

    std::cout << "here45" << std::endl;

  // Creates the OpenCL context, queue, and an event
  auto device_as_vector = std::vector<cl::Device>{device};
  auto context = cl::Context(device_as_vector);
  auto queue = cl::CommandQueue(context, device);
  auto event = cl_event{nullptr};

  // Populate host matrices with some example data
  auto host_a = std::vector<float>(m*k);
  auto host_b = std::vector<float>(n*k);
  auto host_c = std::vector<float>(m*n);
    std::cout << "here1" << std::endl;
  for (auto &item: host_a) { item = 12.193f; }
  for (auto &item: host_b) { item = -8.199f; }
  for (auto &item: host_c) { item = 0.0f; }
  std::cout << "here2" << std::endl;
  // Copy the matrices to the device
  auto device_a = cl::Buffer(context, CL_MEM_READ_WRITE, host_a.size()*sizeof(float));
  auto device_b = cl::Buffer(context, CL_MEM_READ_WRITE, host_b.size()*sizeof(float));
  auto device_c = cl::Buffer(context, CL_MEM_READ_WRITE, host_c.size()*sizeof(float));
  queue.enqueueWriteBuffer(device_a, CL_TRUE, 0, host_a.size()*sizeof(float), host_a.data());
  queue.enqueueWriteBuffer(device_b, CL_TRUE, 0, host_b.size()*sizeof(float), host_b.data());
  queue.enqueueWriteBuffer(device_c, CL_TRUE, 0, host_c.size()*sizeof(float), host_c.data());

  // Start the timer
  auto start_time = std::chrono::steady_clock::now();

  // Call the SGEMM routine. Note that the type of alpha and beta (float) determine the precision.
  auto queue_plain = queue();
  std::cout << "here" << std::endl;
  auto status = clblast::Gemm(clblast::Layout::kRowMajor,
                              clblast::Transpose::kNo, clblast::Transpose::kNo,
                              m, n, k,
                              alpha,
                              device_a(), 0, a_ld,
                              device_b(), 0, b_ld,
                              beta,
                              device_c(), 0, c_ld,
                              &queue_plain, &event);

  int v = static_cast<int>(status);

  std::cout << v << std::endl;

  if (status == clblast::StatusCode::kSuccess) {
    clWaitForEvents(1, &event);
    clReleaseEvent(event);
  }
  auto elapsed_time = std::chrono::steady_clock::now() - start_time;
  auto time_ms = std::chrono::duration<double,std::milli>(elapsed_time).count();

  // Example completed. See "clblast.h" for status codes (0 -> success).
  printf("Completed SGEMM in %.3lf ms with status %d\n", time_ms, static_cast<int>(status));
  return 0;
}

// =================================================================================================

`

Compile with g++ cauer2.cpp -o testcauer -lOpenCL -lclblast

The program hangs at : cl::Platform::get(&platforms); . The previous message "here 40" would be printed, and then nothing.

my system :
Linux glassplanet 6.5.9-1-cachyos #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 06:40:39 +0000 x86_64 GNU/Linux

Although I have NVIDIA GeForce GTX 770M, the drivers dont work, so they are blacklisted.

OCL ICD is installed.

ls /etc/OpenCL/vendors returnes:

.rw-r--r-- 59 root 18 Mär 18:45  intel-oneapi-compiler-shared-opencl-cpu.icd

What is wrong? Thank you

Cedric Nugteren · Answer 1 · Thu Nov 02 2023 20:57:10 GMT+0800 (China Standard Time)

Sorry, but if there is a problem with cl::Platform::get then this is not a CLBlast issue and I can't help you. I would first make sure you can get other OpenCL programs working on your system, and then look at CLBlast afterwards.

Savi · Answer 2 · Sun Nov 05 2023 18:27:06 GMT+0800 (China Standard Time)

Ok, thank you

tangjinchuan · Answer 3 · Sun Nov 05 2023 19:21:28 GMT+0800 (China Standard Time)

TRY to install clinfo to see if the platform prints out correctly.

Savi · Answer 4 · Sun Nov 05 2023 19:47:48 GMT+0800 (China Standard Time)

I did, even clinfo hanged.
So I resolved that using this:

https://www.reddit.com/r/archlinux/comments/11z79x9/opencl_hangs/

cd /etc/OpenCL/vendors/
then
sudo rm -Rf intel-oneapi-compiler-shared-opencl-cpu.icd

Thank oyu