Parts of the BitLinear code doesn't match paper (before bit1.58)
qqqllppp opened this issue · comments
Referencing this paper: https://arxiv.org/pdf/2310.11453.pdf
Code part: https://github.com/kyegomez/BitNet/blob/984ec72c2a45a88b739c85668690fe1abbdf3152/bitnet/bitlinear.py
In general, it seems that the code does not match the paper, mainly Equation (1), (4) and (11). It also seems to be missing the straight-through estimator? (edit: the code also didn't replace bitlinear within the multihead attention)
I also found this other reference implementation which seems to follow the equations from the paper a bit more. https://github.com/Beomi/BitNet-Transformers
@qqqllppp this repo is still in progress, if you notice defects pls send a pull request
Stale issue message