kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Home Page:https://discord.gg/qUtxnK2NMf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parts of the BitLinear code doesn't match paper (before bit1.58)

qqqllppp opened this issue · comments

Referencing this paper: https://arxiv.org/pdf/2310.11453.pdf
Code part: https://github.com/kyegomez/BitNet/blob/984ec72c2a45a88b739c85668690fe1abbdf3152/bitnet/bitlinear.py

In general, it seems that the code does not match the paper, mainly Equation (1), (4) and (11). It also seems to be missing the straight-through estimator? (edit: the code also didn't replace bitlinear within the multihead attention)

I also found this other reference implementation which seems to follow the equations from the paper a bit more. https://github.com/Beomi/BitNet-Transformers

@qqqllppp this repo is still in progress, if you notice defects pls send a pull request

Stale issue message