CNugteren / CLBlast

Tuned OpenCL BLAS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Accuracy problem on Apple M1 and Intel(R) UHD Graphics 770

fengyuentau opened this issue · comments

CLBlast on Apple M1 gives incorrect Sgemm results with sqaure mat of scale >= 1152.

macOS version: 14.4.1

Reproducer is available at https://github.com/fengyuentau/test-clblast.

Test results are shown at https://github.com/fengyuentau/test-clblast?tab=readme-ov-file#results, which are

scale: 1250, max_diff: 348.821442
sacle: 1125, OK
scale: 1187, max_diff: 330.725189
scale: 1156, max_diff: 320.881348
sacle: 1140, OK
sacle: 1148, OK
scale: 1152, max_diff: 323.137970
sacle: 1150, OK
sacle: 1151, OK

I tried to comment out tuning results for Apple M1 and it can give correct resutls this time. Would you accept a patch to revert tuning results for Apple M1?

Update:

OS: Ubuntu 22.04.2 LTS


Also Cgemm results are incorrect on Intel(R) UHD Graphics 770 with scale >= 256. Code and restuls are updated already. Also see below:

scale: 550, real_max_diff: 318.123413, imag_max_diff: 320.759399
scale: 325, real_max_diff: 196.814056, imag_max_diff: 191.424683
sacle: 212, OK
scale: 268, real_max_diff: 162.766602, imag_max_diff: 165.656494
sacle: 240, OK
sacle: 254, OK
scale: 261, real_max_diff: 162.882080, imag_max_diff: 161.465424
scale: 257, real_max_diff: 155.064240, imag_max_diff: 157.468262
sacle: 255, OK
scale: 256, real_max_diff: 159.027313, imag_max_diff: 166.023514

Note that reverting tuning results for the platform does gives accurate results again.

Thanks for reporting this.

However, I see you wrote your own tests, but CLBlast already contains a large and sophisticated test suite. Can you run the relevant (original) CLBlast tests on your hardware for me and see if they also fail? If they don't fail, can you modify them to include the large matrices that you test in your own tests in the original CLBlast tests and re-run them?

Three tests are failed on M1 in the SGEMM routine tests (See below). Other tests are fine. No failed tests after reverting tuning results.

Original code without modification:

./clblast_test_xgemm

* Options given/available:
    -platform 0 [=default]
    -device 0 [=default]
    -full_test [false]
    -verbose [false]
    -cblas 1 [=default]

* Running on OpenCL device 'Apple M1'.
* Starting tests for the 'SGEMM' routine. Legend:
   : -> Test produced correct results
   . -> Test returned the correct error code
   X -> Test produced incorrect results
   / -> Test returned an incorrect error code
   \ -> Test not executed: OpenCL-kernel compilation error
   o -> Test not executed: Unsupported precision
   - -> Test not completed: Reference CBLAS doesn't output error codes
* Testing with error margins of 0.5% (relative) and 0.001 (absolute)
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 111 (regular)':
   ::::::::----::::---:---:-------:::::::::----::::---:---:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 112 (transposed)':
   ::::::::------::-:-:-:-:-------:::::::::------::-:-:-:-:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 111 (regular)':
   ::::::::::::::::---:---:---:---:----:::X----::::-------:-------:
   Error rate 78.09%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate  45.3%: 29 passed / 34 skipped / 1 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 112 (transposed)':
   ::::::::--::--::-:-:-:-:---:---:----::::------::-----:-:-------:
   Pass rate  42.2%: 27 passed / 37 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 111 (regular)':
   ::::::::--::--::::::::::--::--::-----:-:-------:-----:-:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 112 (transposed)':
   ::::::::::::::::--:X--::--::--::-----:-:-----:-:-------:-------:
   Error rate 77.74%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate  45.3%: 29 passed / 34 skipped / 1 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 111 (regular)':
   ::::::::------::::::::::------::-:-:-:-:-------:-:-:-:-:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 112 (transposed)':
   ::::::::----::::--::--::------::-:-:-:-:-----:-:---:---X-------:
   Error rate 96.95%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate  40.6%: 26 passed / 37 skipped / 1 failed
* Completed all test-cases for this routine. Results:
   231 test(s) passed
   278 test(s) skipped
   3 test(s) failed

...

Many CGEMM tests are failed on Intel(R) UHD Graphics 770 (See below). Others are fine. Again no failed tests after reverting tuning results.

$ ./clblast_test_xgemm

...

* Running on OpenCL device 'Intel(R) UHD Graphics 770'.
* Starting tests for the 'CGEMM' routine. Legend:
   : -> Test produced correct results
   . -> Test returned the correct error code
   X -> Test produced incorrect results
   / -> Test returned an incorrect error code
   \ -> Test not executed: OpenCL-kernel compilation error
   o -> Test not executed: Unsupported precision
   - -> Test not completed: Reference CBLAS doesn't output error codes
* Testing with error margins of 0.5% (relative) and 0.001 (absolute)
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 111 (regular)':
   ::::::::----::::---X---X-------X::::::::----::::---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  37.5%: 24 passed / 34 skipped / 6 failed
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 112 (transposed)':
   ::::::::------::-X-X-X-X-------X::::::::------::-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 113 (conjugate)':
   ::::::::------::-X-X-X-X-------X::::::::------::-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 111 (regular)':
   ::::::::::::::::---X---X---X---X----::::----::::-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  37.5%: 24 passed / 34 skipped / 6 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 112 (transposed)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 113 (conjugate)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '101 (row-major) 113 (conjugate) 111 (regular)':
   ::::::::::::::::---X---X---X---X----::::----::::-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  37.5%: 24 passed / 34 skipped / 6 failed
* Testing 'regular behaviour' for '101 (row-major) 113 (conjugate) 112 (transposed)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '101 (row-major) 113 (conjugate) 113 (conjugate)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 111 (regular)':
   ::::::::--::--::XXXXXXXX--XX--XX-----:-:-------:-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  23.4%: 15 passed / 34 skipped / 15 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 112 (transposed)':
   ::::::::::::::::--XX--XX--XX--XX-----:-:-----:-:-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 113 (conjugate)':
   ::::::::::::::::--XX--XX--XX--XX-----:-:-----:-:-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 111 (regular)':
   ::::::::------::XXXXXXXX------XX-:-:-:-:-------:-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  23.4%: 15 passed / 34 skipped / 15 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 112 (transposed)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 113 (conjugate)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 113 (conjugate) 111 (regular)':
   ::::::::------::XXXXXXXX------XX-:-:-:-:-------:-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  23.4%: 15 passed / 34 skipped / 15 failed
* Testing 'regular behaviour' for '102 (col-major) 113 (conjugate) 112 (transposed)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 113 (conjugate) 113 (conjugate)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Completed all test-cases for this routine. Results:
   341 test(s) passed
   636 test(s) skipped
   175 test(s) failed

...

Thank you for running the tests. Perhaps this could be related to #533.

Since I don't have the same devices to test on as you do, I simply modified GetDeviceName and GetDeviceVendor in src/utilities/utilities.cpp to use tuning parameters for the M1 and UHD 770 on my own device. I did manage to reproduce the issues with the Intel UHD 770 exactly as you reported them locally, but I did not manage to reproduce the Apple M1 issue. So I'll need to dig deeper for the M1.

But first I'll try to solve the Intel UHD 770 issue. If I simply use the Intel GPU default parameters (in src/database/kernels/xgemm/xgemm_3232.hpp) the issue is resolved, so it is related to those values. They might be illegal (and thus there is a bug in the CLBlast tuner) or there might be a bug in the CLBlast kernels. It might be the same as the old issue #340. I'll investigate and let you know if there is progress.

Let me know if I can help with the M1 issue.

Do we have options to build and use this library without tuning results?

Some initial results: when I revert #341, then the issue seems resolved, at least for a few tests I did. I'll do some more investigation and re-read the original #340 issue again, and will keep you updated.

Do we have options to build and use this library without tuning results?

Well it depends on what you mean with 'without tuning results', because it needs to use some set of parameters. What you can do is modify src/utilities/utilities.cpp as I mentioned above to mimic another device. You could then change to the default Apple GPU parameters for example (if you name your device 'Apple Non Existing Device' for example) or use the default-default parameters if you change your device vendor also to something non existent.

Thank you for the quick update!

Well it depends on what you mean with 'without tuning results', because it needs to use some set of parameters.

We can add compile definition (e.g. HAVE_TUNING_RESULTS which can be controlled via CMake option and default to ON) then guard every tuning result except defaults with this macro. Below is an exmaple What do you think?

# CMakeLists.txt
option(WITH_TUNING_RESULTS "" ON)

if(WITH_TUNING_RESULTS)
  add_compile_definitions(-DHAVE_TUNING_RESULTS)
endif()

Then we can use #if HAVE_TUNING_RESULTS to guard tuning results except defaults.

This PR #543 likely solves the issue you reported on the Intel UHD 770. If you could try it out to confirm, that would be great!

The issue with the Apple M1 seems unrelated, since that device doesn't use this GEMMK=1 kernel that caused the issue. I also can't reproduce the issue on my own machine (non-Apple) by simply using the M1's tuning parameters, so there seems to be something else going on here. I'll have a think soon to see how we can debug this further.

I will have a try later in this week. Thank you for the quick fix!

@CNugteren I can confirm that #543 fixes the accuracy problem on the Intel UHD 770.

As for the Apple M1 accuracy problem, let me extend existing test in this repository to give you more results.

@CNugteren I can also confirm that with #543 fixes the accuracy problem on Apple M1.

I guess we are done with this issue. Thank you for the quick response and updates and patches!