kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Home Page:https://discord.gg/qUtxnK2NMf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NanoGPT sample

izaxon opened this issue · comments

Is your feature request related to a problem? Please describe.
I have tried to replace the Linear layer in https://github.com/karpathy/nanoGPT with the bitnet one, but traning doesn't seem to converge.

Describe the solution you'd like
I look for a solution like e.g. increase the size of these (?) layers in order to get nanoGPT to work with bitnet.

Describe alternatives you've considered
I have tried replacing the linear layers, and their sizes. I have also seen/and not solved inference properly (tokens in the output are outsize of total number of tokens (using shakespeare dataset).

Additional context
n/a

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar