NanoGPT sample
izaxon opened this issue · comments
Is your feature request related to a problem? Please describe.
I have tried to replace the Linear layer in https://github.com/karpathy/nanoGPT with the bitnet one, but traning doesn't seem to converge.
Describe the solution you'd like
I look for a solution like e.g. increase the size of these (?) layers in order to get nanoGPT to work with bitnet.
Describe alternatives you've considered
I have tried replacing the linear layers, and their sizes. I have also seen/and not solved inference properly (tokens in the output are outsize of total number of tokens (using shakespeare dataset).
Additional context
n/a
Upvote & Fund
- We're using Polar.sh so you can upvote and help fund this issue.
- We receive the funding once the issue is completed & confirmed by you.
- Thank you in advance for helping prioritize & fund our backlog.