Parts of the BitLinear code doesn't match paper (before bit1.58)

Question

Parts of the BitLinear code doesn't match paper (before bit1.58)

qqqllppp opened this issue 4 months ago · comments

Referencing this paper: https://arxiv.org/pdf/2310.11453.pdf
Code part: https://github.com/kyegomez/BitNet/blob/984ec72c2a45a88b739c85668690fe1abbdf3152/bitnet/bitlinear.py

In general, it seems that the code does not match the paper, mainly Equation (1), (4) and (11). It also seems to be missing the straight-through estimator? (edit: the code also didn't replace bitlinear within the multihead attention)

I also found this other reference implementation which seems to follow the equations from the paper a bit more. https://github.com/Beomi/BitNet-Transformers

Kye Gomez · Answer 1 · Sat Mar 02 2024 00:32:28 GMT+0800 (China Standard Time)

@qqqllppp this repo is still in progress, if you notice defects pls send a pull request

github-actions · Answer 2 · Wed May 01 2024 20:44:51 GMT+0800 (China Standard Time)

Stale issue message