corsix / amx

Apple AMX Instruction Set

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possibility of adding support for Linux for Apple AMX1 and AMX2

FCLC opened this issue · comments

Hi all,

I'm in the process of researching Apple AMX as a potential way of speeding up IEEE FP BLAS kernels in OpenBLAS.

On the MacOS side, it seems that between this repository and other resources, I have all I need to be able to write the kernels.

The issue as of now is Linux. Speaking with the folks supporting/developing Asahi Linux (see mastodon thread here: https://mast.hpc.social/@fclc/109914828822965657) discussion came up that Asahi has no plans to support the EL0 CPU state required for AMX.

I'm of the opinion that it may be possible to implement a Linux kernel module to allow for the usage of AMX on M1, M2 and the various SKUs based on those SOCs.

This would probably require fairly tight understanding of AMX and its underlying operations.

I was hoping for insight from any of the folks working on this present project.

My understanding is that to make this all come together, we'd need the following:

Assembler support for apple AMX in LLVM/GCC
Linux Kernel Support for private ISA extensions
A Linux kernel module that adds support for the Apple AMX extensions

Followed by any system/software needing support for those 3 things before being able to support development of compute kernels/code using the ISA

Links to the relevant conversations from Asahi and OpenBLAS folks here:

Asahi: https://mast.hpc.social/@fclc/109914828822965657

OpenBLAS: https://twitter.com/FelixCLC_/status/1627404588574818304?s=20

Assembler support for apple AMX in LLVM/GCC

This bit is not strictly required; aarch64.h works with unmodified compilers.

... kernel ...

On the technical front, you've got extra state that needs saving/restoring on context switches. The political front seems more concerning to the Asahi folk though.

Talking with a few people at the vendors in question, as well as the Asahi Folks, looks very much to be a political issue, and concerns around what Arm might do in a scorched earth scenario.

For now I'm finishing my x86 FP16 work before investing too much time and energy into this.

It's worth mentioning that BLIS already has a "research" version of AMX support.

https://github.com/xrq-phys/blis_apple

https://www.youtube.com/watch?v=xMiWe07Rjss