kyegomez / FastFF

Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect Implementation

ThomasPluck opened this issue · comments

  1. Weight matrices should be 2 ** depth - 1 not 2 * depth -1
  2. ChatGPT clamping fix in CMM isn't unnecessary due to this correction
  3. Einsum usage is very inefficient both in time and space for FFF

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.