flame / blis

BLAS-like Library Instantiation Software Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BF16 on AMD CPU?

moderato opened this issue · comments

Hi there, I saw there is some code in the sandbox/power10 folder for BF16 GEMM. I suppose that is just for POWER10 machines? Is it possible to build and run code with bli_sbgemm on AMD CPU? Thanks!

Hi there, I saw there is some code in the sandbox/power10 folder for BF16 GEMM. I suppose that is just for POWER10 machines? Is it possible to build and run code with bli_sbgemm on AMD CPU? Thanks!

Thanks for your question. Yes, that code is specific to POWER10 systems. The author (@nicholaiTukanov) likely did not intend for it to run on AMD CPUs. That said, we always encourage power users (pun not intended) to tinker around and see what you can get working!

Hi there, I saw there is some code in the sandbox/power10 folder for BF16 GEMM. I suppose that is just for POWER10 machines? Is it possible to build and run code with bli_sbgemm on AMD CPU? Thanks!

Thanks for your question. Yes, that code is specific to POWER10 systems. The author (@nicholaiTukanov) likely did not intend for it to run on AMD CPUs. That said, we always encourage power users (pun not intended) to tinker around and see what you can get working!

Thanks for the reply. Does that mean there is no BF16 support for AMD CPUs for now?

Hi Zhongyi Lin,

You can use BF16 implementation designed for zen4 and above, which are available in aocl_gemm addon in amd/blis https://github.com/amd/blis/tree/master/addon/aocl_gemm

You can build clone amd version of blis and build with aocl_gemm addon and call one of the below api's which has similar arguments, one can pass null for post-ops structure argument if intended to use only for gemm. API definitions available in this file https://github.com/amd/blis/blob/master/addon/aocl_gemm/aocl_gemm_interface_apis.h

aocl_gemm_bf16bf16f32of32( ) - This API accumulates at float (f32) precision and gives the output in float (f32)
aocl_gemm_bf16bf16f32obf16( ) - This API accumulates at float (f32) precision and gives the output in bf16 format (which is half the size)

Bhaskar

Hi Zhongyi Lin,

You can use BF16 implementation designed for zen4 and above, which are available in aocl_gemm addon in amd/blis https://github.com/amd/blis/tree/master/addon/aocl_gemm

You can build clone amd version of blis and build with aocl_gemm addon and call one of the below api's which has similar arguments, one can pass null for post-ops structure argument if intended to use only for gemm. API definitions available in this file https://github.com/amd/blis/blob/master/addon/aocl_gemm/aocl_gemm_interface_apis.h

aocl_gemm_bf16bf16f32of32( ) - This API accumulates at float (f32) precision and gives the output in float (f32) aocl_gemm_bf16bf16f32obf16( ) - This API accumulates at float (f32) precision and gives the output in bf16 format (which is half the size)

Bhaskar

Hi Bhaskar, thank you for this valuable information. Will try and let you know.