BF16 on AMD CPU?

Question

BF16 on AMD CPU?

moderato opened this issue 2 months ago · comments

Hi there, I saw there is some code in the sandbox/power10 folder for BF16 GEMM. I suppose that is just for POWER10 machines? Is it possible to build and run code with bli_sbgemm on AMD CPU? Thanks!

Field G. Van Zee · Answer 1 · Thu May 02 2024 07:51:19 GMT+0800 (China Standard Time)

Hi there, I saw there is some code in the sandbox/power10 folder for BF16 GEMM. I suppose that is just for POWER10 machines? Is it possible to build and run code with bli_sbgemm on AMD CPU? Thanks!

Thanks for your question. Yes, that code is specific to POWER10 systems. The author (@nicholaiTukanov) likely did not intend for it to run on AMD CPUs. That said, we always encourage power users (pun not intended) to tinker around and see what you can get working!

moderato · Answer 2 · Thu May 02 2024 08:02:18 GMT+0800 (China Standard Time)

Hi there, I saw there is some code in the sandbox/power10 folder for BF16 GEMM. I suppose that is just for POWER10 machines? Is it possible to build and run code with bli_sbgemm on AMD CPU? Thanks!

Thanks for your question. Yes, that code is specific to POWER10 systems. The author (@nicholaiTukanov) likely did not intend for it to run on AMD CPUs. That said, we always encourage power users (pun not intended) to tinker around and see what you can get working!

Thanks for the reply. Does that mean there is no BF16 support for AMD CPUs for now?

Bhaskar Nallani · Answer 3 · Thu May 02 2024 12:42:24 GMT+0800 (China Standard Time)

Hi Zhongyi Lin,

You can use BF16 implementation designed for zen4 and above, which are available in aocl_gemm addon in amd/blis https://github.com/amd/blis/tree/master/addon/aocl_gemm

You can build clone amd version of blis and build with aocl_gemm addon and call one of the below api's which has similar arguments, one can pass null for post-ops structure argument if intended to use only for gemm. API definitions available in this file https://github.com/amd/blis/blob/master/addon/aocl_gemm/aocl_gemm_interface_apis.h

aocl_gemm_bf16bf16f32of32( ) - This API accumulates at float (f32) precision and gives the output in float (f32)
aocl_gemm_bf16bf16f32obf16( ) - This API accumulates at float (f32) precision and gives the output in bf16 format (which is half the size)

Bhaskar

moderato · Answer 4 · Fri May 03 2024 04:39:21 GMT+0800 (China Standard Time)

Hi Zhongyi Lin,

You can use BF16 implementation designed for zen4 and above, which are available in aocl_gemm addon in amd/blis https://github.com/amd/blis/tree/master/addon/aocl_gemm

You can build clone amd version of blis and build with aocl_gemm addon and call one of the below api's which has similar arguments, one can pass null for post-ops structure argument if intended to use only for gemm. API definitions available in this file https://github.com/amd/blis/blob/master/addon/aocl_gemm/aocl_gemm_interface_apis.h

aocl_gemm_bf16bf16f32of32( ) - This API accumulates at float (f32) precision and gives the output in float (f32) aocl_gemm_bf16bf16f32obf16( ) - This API accumulates at float (f32) precision and gives the output in bf16 format (which is half the size)

Bhaskar

Hi Bhaskar, thank you for this valuable information. Will try and let you know.