BitNet (b1.58) support
EwoutH opened this issue · comments
First of all, thanks. We need more ramps.
I was curious what you think of BitNet, and if llm.c is a place where experimenting with it could be facilitated. The papers were extremely promising and got a lot of traction, but there while there have been a few (small scale) reproductions yet, there isn't a easy ramp to start experimenting with it.
Papers
I don't think we have it on the current roadmap, Andrej can chime in. We have a lot of stuff on the backlog before we get here, including potentially supporting fp8, ZeRO stage 2, etc.
The problem with BitNet (b1.58) training is that is still uses FP16/BF16 for training so the memory consumption does not decrease. Anyways getting support for it would be great! If used with FP8 training it could bring improvement.
