OpenNMT / CTranslate2

Fast inference engine for Transformer models

Home Page:https://opennmt.net/CTranslate2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intel Advanced Matrix Extensions (AMX) support

ahmetcanik opened this issue · comments

Hello CTranslate2 developers,

I am a user of your library and I appreciate your work on providing fast and accurate inference engine. I am wondering if you have any plans to support Intel Advanced Matrix Extensions (AMX) for CPU inference. According to Intel, AMX can speed up inference by several factors for certain models and data types.

I have tried to compile CTranslate2 from the source code with the -mamx-tile -mamx-int8 -mamx-bf16 flags, but it seems that there are some additional steps required to enable AMX (maybe adding a new kernel as vec_amx.h with modifiying vec_avx512.h to enable AMX tile operations).

I would appreciate it if you could share your thoughts on this topic and let me know if AMX support is feasible and desirable for CTranslate2.

Thank you for your time and attention.

Hello,
Thank you for your suggestion. We have no plan to do this now. However, some works are needed to implement AMX for some operations. It would be nice to have this, I will look at it more in detail.