CTranslate2 for fast model inference?

Question

CTranslate2 for fast model inference?

SuperPauly opened this issue 8 months ago · comments

Just curious to know if there will be a https://github.com/OpenNMT/CTranslate2 implementation in the future plans.

It has a faster engine for inference with Transformers and for quite a speed up as well sometimes nearly triple. I had a quick look at the documents https://opennmt.net/CTranslate2/guides/transformers.html#whisper and converting the model to there format is easy enough but after that I get lost.

Anyone know of any plans to implement this? or has it been looked at and decided to difficult, not really needed or incompatible to work with NX?

Thanks.

Paulo Valente · Answer 1 · Fri Nov 17 2023 04:03:29 GMT+0800 (China Standard Time)

I'm not aware of any plans by the rest of the team, but you should be able to implement part of the Nx.Backend behaviour as a compatibility layer for moving tensors to and from CTranslate2. Then you do whatever inference patterns you want on a tensor, and then bring it back to "normal" Nx to do further operations, if so desired.

https://github.com/acalejos/exgboost/tree/main/lib/exgboost also has an approach for making the bridge with Nx, where the main interface receives and outputs tensors, but internally they are converted to custom data structures to be operated on.