google / gemmlowp

Low-precision matrix multiplication

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A problem with the design of kernel in Arm64

knjwhn opened this issue · comments

I am studying the kernels implemented in gemmlowp for arm64, and noticed that the main kernel we used is 12x8x2 , I know the cellformat is
KernelFormat<
KernelSideFormat<CellFormat<4, 2>, 3>,
KernelSideFormat<CellFormat<4, 2>, 2>>
,and I'd like to know why the depth is choose 2,instead of 1 or others. Is that the reason in this condition we can use more efficiently of registers ? or there are some other scientific reasons to choose kernel depth?
Thanks a lot if anyone could help.

Yes, the reason was to use as many registers as possible to maximize mutual independence of instructions.