CNugteren / CLBlast

Tuned OpenCL BLAS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GEMM Batched Question

FloreaMario opened this issue · comments

When looking over the GEMM Batched API it is clear that it accepts an attribute that describes the a_transpose and the b_transpose: GemmBatched(const Layout layout, const Transpose a_transpose, const Transpose b_transpose ..

I assume a_transpose and b_transpose reffer to the transpose status of the already Batched matrixes?

I've checked what intel MKL library is doing for gemm_batch and they expect a pointer to an array of transpose elements, each corresponding to a matrix from the batched group. https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2023-0/cblas-gemm-batch.html

I just wanted to confirm that my understanding of how clblast is using the transpose element is correct.

Yes, CLBlast takes only a single layout and transpose argument that applies to all batches, see also the API docs. This is also how it is done in cuBLAS.

Thank you!