Request to Support AWS Inferentia2 for More Cost-Effective and Faster Inference in MPT

Question

Request to Support AWS Inferentia2 for More Cost-Effective and Faster Inference in MPT

anjiefang opened this issue 7 months ago · comments

🚀 Feature Request

Integrate support for AWS Inferentia2 into MPT, enabling to leverage this powerful and cost-friendly inference solution through AWS.

Motivation

Refer to this post:

AWS Inferentia2 is designed for its cost-efficiency in comparison to Nvidia chips. This cost savings can be significant for users who rely on MPT for various applications.
AWS Inferentia2 has demonstrated the potential for faster inference, which would improve the overall responsiveness and usability of MPT.

[Optional] Implementation

Additional context

Does the team already have a plan to leverage Inferentia2? If not, can the team provide any guidance on how to migrate to Inferentia2 chips?

Deleted user · Answer 1 · Fri Dec 01 2023 03:40:52 GMT+0800 (China Standard Time)

@anjiefang : We do have plan to leverage inf2 but it's currently not prioritized high. MPT architecture is a standard gpt-decoder style architecture with one change. Attention module has ALiBi. As long as inf2 supports Attention with ALiBi, converting MPT to run on inf2 is not a huge lift. If inf2 doesn't support attention with ALiBi, we will have to ask the aws team to support it. Please checkout this script we used to convert MPT weights to FT format and something similar for inf2 can be used.