kyegomez / FastFF

Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"

Incorrect Implementation

ThomasPluck opened this issue 2 months ago · comments

Thomas Pluck commented 2 months ago

Weight matrices should be 2 ** depth - 1 not 2 * depth -1
ChatGPT clamping fix in CMM isn't unnecessary due to this correction
Einsum usage is very inefficient both in time and space for FFF

github-actions commented 2 months ago

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.